 Thank you all for coming out tonight and giving me the opportunity to talk to a Somewhat different group of people from the the people I normally talk to our scientific results Talk to about our scientific results So I wanted to start by as Bruce Mentioned in his in his introduction by by talking a little bit About the the New York Times those of you who read the New York Times science page or the Washington Post or the Guardian or look at National Geographic or any of these other Publications will have seen headlines like this It's a very nice article that came out about a year ago by one of the best science writers in the area of evolution Carl Zimmer about Reconstructing the Tree of Life and if you were to click through and look at that article you would see an image That looks like this Showing the evolutionary relationships among 3,000 different species sampled from across the globe And if you look closely at this picture, you'll see that most of this tree is made up of single-celled organisms bacteria And primitive single-celled organisms called archaea and the organisms that we know best including Animals plants and fungi are all down in this little tiny corner here and the animals that we Identify with the most such as vertebrates and mammals are so small that they don't even get a label on this graph So some of these pictures are quite remarkable You can see from this picture that everything traces back To a single Universal ancestor of all living things that would have lived about 3.5 billion years ago as best we can tell from the fossil record and from genetic analysis if you continued to look through these sorts of Popular publications you'd see a number of different articles about evolution dinosaur evolution evolution of influenza Fruit fly evolution and the way that natural selection influences fruit flies evolution of lice a co-evolution in mammals and dinosaurs and of course many articles about human evolution The evolution of mountain populations and their adaptations to high altitude the population of Australia and of the Americas And then of course many articles about one of my favorite topics neanderthals and one of these Articles actually describes a paper that we published a little more than a year ago And I'm going to tell you about that today Now of course if you're less selective about your sources you might encounter some articles like this one Which happens also to be about our paper? I'm sorry to say This is an actual news report in the in the Huffington post talk about a coarsening of public discourse It's rather embarrassing and this is even before the Trump era All right, but what I want to talk about today is how do we actually know this stuff these findings are obviously astonishing These stories about human evolution about dinosaurs about the tree of life But how can we figure this sort of thing out from modern-day evidence? The short answer I'm going to give you is a term called molecular phylogenetics Where phylogenetics comes from the term phylogeny introduced by Ernst Teckl in the mid 19th century a German biologist essentially means Genesis and evolution of a phylum or a branch of life and Molecular refers to the analysis of DNA sequences for the most part these days But also protein sequences the sequences of amino acids that make up proteins RNA sequences and other biomolecules and so in this talk tonight, I'm going to try to give you a sense for what this field is How it developed over the last 50 years or so and then toward the end of the talk What it can tell us about our own ancestry and our relationship to Neanderthals So like any good academic I'll start by establishing my credibility So I've been studying Molecular phylogenetics for a long time I got very interested in this topic right after college in the early 1990s when I was working at Los Alamos National Labs Studying HIV and I discovered this whole world of making use of computers to to reconstruct the past and became fascinated by it over time we've published Papers describing the evolutionary relationships among cultivated plants. We've described Processes by which bacteria transfer DNA from one strain to another This is a process called horizontal transfer. We've studied complex families of genes that evolve by duplication and loss as well as through speciation We've studied small RNAs and fruit flies and many many other topics But there's a collection of core techniques that we've used again and again Throughout these sorts of analyses and they rely on modeling the evolution of DNA sequences along the branches of an evolutionary tree Or a phylogeny and so I'm going to try to tell you a little bit about how this area came about And and how it works So no talk about phylogenetics would be complete without this slide how many of you seen this picture? All right, quite a few may be 20% or so. This is a famous image. It's claimed to be the first phylogeny. This is actually drawn by Charles Darwin in his Famous notebook B about 1837 so that's 20 years before the origin of species was even published So he was doodling in his notebooks pictures of evolutionary trees And he realized as soon as he started to think about these processes by which one Species would evolve from another that this would give rise to a branching structure where You approached the present day and you would you would have a series of branching Operations that would lead to a Kind of a family tree among all living species today Darwin was still quite taken with this idea by the time he published on the origin of species And this is actually the single figure in in that book if you flip to the very back It's not even numbered because there's only one figure in the book It's mostly text if you go to the very back you'll see this image here It's this the single figure in the origin of species and at times Darwin spoke Rather poetically about this image of the tree He talked about how limbs divided into great branches were themselves once when the tree was small budding twigs and so on and so forth He wasn't the first one to think in terms of trees You can see the image of the tree you can see precursors of this idea of the tree and the work of Linnaeus and Lamarck and others But Darwin was the first one to sort of unify this idea of evolution with a tree And and realize that it would imply that all life on earth Was related by a single tree So for many years Biologists tried to build trees, but not having DNA sequences. They had to work with observable traits And this was was an area that became known as cladistics biologists would identify particular characters phenotypic characters morphological characters in Organisms and try to come up with branching relationships that would that would only require those characteristics to emerge once so for example they imagine that a vertebral column would emerge once and that would separate a lamprey from Lancelot and then jaws would emerge once and they would separate a tuna from a lamprey and so on and so forth and in that way They were able to get a pretty good idea of what the tree of life might look like But of course there were many difficult to resolve Evolutionary questions parts of the tree that were difficult to work out because there weren't good characteristics physical characteristics That separated one group of organisms from another So the the key development that people tend to point back to in the emergence of Molecular phylogenetics is an observation by Linus Pauling and Ernst Sorry, Emil Zuckerkandel in the early 60s who were studying the hemoglobin protein And they were looking at hemoglobin proteins that they had sequenced from various different species And they knew something about the evolutionary relationships about these species and about how long ago they must have diverged Based on the fossil record and they noticed that the numbers of differences in these amino acid sequences were roughly proportional to the Estimated evolutionary time since these species have diverged so they introduced this idea of a molecular clock of a clock That's ticking over time laying down new mutations on on these amino acid sequences and those Sequences those mutations are accumulating over time so that things that are more distantly related have more Mutations between them and things that are more closely related have fewer mutations between them So their idea looks something like this you would have a gene in some ancestral species That species would split through some sort of speciation event into two daughter species Maybe one group of organisms in the population would migrate to the other side of the river and stop interacting with the other subset of that species and Over time they would diverge from one another into two subspecies and then those subspecies would begin to accumulate Mutations separately and so now if you were to compare a protein from one of them with a protein of another you might see that there were Two mutations unique to this one and two mutations unique to this one but now as time goes on more mutations would be accumulated and perhaps additional speciation events would occur and you'd have more and more Differences accumulating between the proteins that were present in these individual species So now if you looked at the protein from species B and species C They would only differ at a few places But the proteins for B and C would differ from the protein from species A in many more Locations and in this way you could start to imagine Reconstructing an evolutionary tree by counting up the numbers of differences in these proteins. That was the core idea introduced by Zooker-Kondel and Pauling so if you look at modern-day data for proteins For a particular protein in this case the cytochrome C protein from a number of different organisms in this case we're focusing on a number of mammals and you you plot the Estimated number of years since those two species diverged from a common ancestor as Estimated from the the fossil record against the number of Substitutions in this case they're going to be DNA Substitutions rather than amino acid substitutions, but the principle is the same you would see over time an approximately linear relationship between those two properties and in this way these mutations can be thought of as a kind of a clock That we can use to date the time since things Diverged and also to reconstruct the shapes of the evolutionary trees that describe their relationships So Zooker-Kondel and Pauling's observations were really just sort of empirical They just noticed this property of proteins. They didn't really give a recipe for how to reconstruct the phylogeny from this sort of data But a few years later this became a very active area of research and one of the pioneers in this area was the Italian human geneticist Luca Cavalli-Sforza Who collaborated closely with a British statistician? Anthony Edwards and they came up with the first recipes for using this sort of data to reconstruct a tree that would show How closely different organisms were related and how long ago they might have diverged So over the next ten years or so there were a large number of different Types of techniques proposed for reconstructing these trees I want to show you what one of them looks like this turns out to be one of the most intuitive and easy to understand And also one of the most powerful techniques for reconstructing evolutionary relationships It's called parsimony because it it tries to find an evolutionary history that minimizes the number of changes Required to explain the observed data, and I'll show you what I mean by that as we go forward Imagine we have three species one two and three and for simplicity. Let's imagine we know Their ancestral sequence. Maybe we can infer it by looking at a distant distantly related relative And we want to find what the evolutionary relationship is among species one two and three Let's focus first on just one variable position in those sequences. This is known as a site in the literature So at this particular position species one and two have a sea and species three has an a And now we're going to consider that there are three possible evolutionary Relationships among those three species either one and two could be most closely related with three as an outgroup Or one and three could be most closely related with two as an outgroup or two and three could be most closely Related with one as an outgroup. Those are the only possible relationships among three species And now let's try to imagine the minimal sequence of mutations that could explain the observed data If we assume that there's an a at the root of the tree Well, if there's an a at the root of the tree then we can explain this data Under the first tree by just one mutation from an a to a c Along this branch leading to species one and two does everybody see that there were a mutation there It would lead to a shared c in species one and two while species three would still have an a If we try to explain the same data using species two or three we can only do it with a minimum of two mutations Requires two mutations to explain this pattern under species under under tree two and it requires two under tree three Now that doesn't necessarily mean species tree two or tree three are wrong There are going to be cases where there are multiple mutations that happen at a site But if we systematically see across all positions in the data that one tree is supported more than the others That gives us a strong belief that that must be the true evolutionary relationship among those species and in this case If we look at the other sites sites two three and four and similarly try to match them up with the tree I'm not going to go through all the details if we do that. We find that actually None of that none of these sites strongly supports one tree over the other they all require five mutations Across these three sites to explain But if we had a large number of sites we could add them up and we could say what's the total number of events? Required under each of these trees in order to explain the observed data in this case We get six events under tree one seven events under tree two and seven events under tree three So that gives us some confidence that tree one is most consistent with the data now in this case Maybe not a whole lot of confidence Maybe this is not the greatest example but in real examples we would hope we would look at hundreds or thousands of sites and see Many many dozens or many hundreds of cases where you prefer one species at one tree over the other Okay, and in practice, that's what people do when they analyze these data So here's an example. This is a famous example So one of those cases that I mentioned where morphological characters Were difficult to resolve a question of evolutionary Relationships is the case of the great apes so that in particular the question of whether humans Homo sapiens are more closely related to chimpanzees Pantroglodides or gorillas gorilla gorilla, and this was a problem that a Plague taxonomous for many years because there are many Derived traits among all three of those organisms, and it wasn't clear which two were more closely related than the other So by the the late 1980s Morris Goodman and his group had obtained quite a lot of sequence data for the time tens of thousands of DNA Nucleotides from each of these species This was in the area of the beta-globin gene And they used those to perform this sort of parsimony analysis that I just told you about and what they found was that they The best tree was the one that I'm showing here that groups humans and chimps With gorillas as an out group and that tree required 383 different Institution events nucleotide mutations and they mapped those to the branches of the tree And there are quite a few not a huge number but a significant number that support that grouping of Humans and chimps so these are mutations that are shared by humans and chimps and not shared by the other great apes Right and this gave them quite a lot of confidence that this was the true evolutionary relationship among these species Another one of my favorite examples also sort of a classic in the phylogenetic literature has to do with the Cetaceae whales dolphins and porpoises So as many of you know whales are mammals But it's not obvious how they relate to other mammals because they're morphologically so distinct They're so highly diverged from other mammals. So this was a problem that plague taxonomists for many years as well And in a number of papers in the late 90s most notably this one by a Japanese group in 1999 obtained sequence data from Toothed whales and baleen whales along with many other mammals and they showed very clearly that the closest relatives of mammals were hippopotamuses So this was quite striking the the the the whales Adolphins and porpoises Trace their ancestry to an ancestor of hippopotamuses about 50 million years ago And this is something that is now fairly well supported by the fossil record as well. It appears that this this evolutionary divergence happened In and on the Indian subcontinent actually started probably with a terrestrial mammal And and and sometime later They made their way into the ocean. So the fact that hippopotamuses are Are aquatic as well is an example of convergent evolution They it's believed from the fossil record that their ancestors were terrestrial Okay Another great figure from the early days of molecular phylogenetics was a guy named Alan Wilson and I want to focus on Wilson in particular here because he was especially interested in the evolution of humans and of the great apes And he was a very prolific author throughout the 1960s 70s and 80s and one of the pioneers in obtaining sequence data from humans and other apes and Finding out the relationships among those individuals He also is important in that he trained a number of important people in this field Including Svante Pabo who's a person I'm going to tell you about a little bit later one of the pioneers in Neanderthal DNA sequencing Ellen Wilson also trained Mary Claire King who's the discoverer of the Bracka one breast cancer gene, which some of you might have heard of so he's a very influential Person in genetics and evolution During this period Actually, if you look closely you can see in this picture here. He's drawing molecular clock pictures This is cytochrome C. There's hemoglobin and there are a few others He's drawing these pictures like the one I just showed you about how as time goes on proteins diverge in a roughly linear fashion So Wilson and his colleagues Rebecca Kahn and Mark Stone King Published a very important paper also in the late 80s this time in nature This was really the first large-scale study of Human evolution based on mitochondrial DNA, so they collected 147 Samples from 147 different people from around the world Sequenced their mitochondrial DNA and then built a big evolutionary tree using parsimony methods like the ones I just told you about describing how those individuals were related You can't see what I'm showing you there, so I'm going to zoom in a little bit on a subset of these individuals What they found was that They looked at multiple populations from around the world African Africans Asians Australians Europeans what they found was that most of these non-African groups such as the Europeans Formed clades they formed clusters on the trees that they were able to reconstruct But the Africans almost invariably fell outside of the variation in these non-African subgroups and that's very very strongly suggested that Africa was the original source of Human genetic diversity and that these various groups had emerged out of Africa Sometime after the African diversity had already been established And this has been supported now as I'll show you by many many subsequent studies in general We see much greater evolutionary divergence Diversity within Africa than we see in these non-African populations and these typically represent subsets of the genetic diversity that had been present in Africa and then moved out possibly in multiple Colonizations you can see in their abstract. They actually mention Multiple origins for non-African populations and we'll see later in my talk that that's something that has persisted to the day And something that our work Tends to support Another another piece of this study was they Obtained an estimated date for the divergence of all of these populations and they estimated at about 200,000 years ago that turns out to be a date that also holds up pretty well We'll come back to that as the talk goes on so this led to the the terminology mitochondrial Eve Some of you may have heard of this That the idea is that all people on earth can trace their maternal Inheritance back to one woman who lived in Africa about 200,000 years ago, and she would be mitochondrial Eve So I neglected to tell you some of you may know this but the but the mitochondrial DNA is is inherited from your mother only From it's a maternally inherited molecule Whereas most of your DNA is inherited from both parents So when you reconstruct the history of human populations using mitochondrial DNA, you're reconstructing only your maternal history So this these results referred only to that All right, so throughout the The 1990s people continued to work Hard on these phylogenetic methods for understanding human populations And a particular pioneer in this area was this guy Luca Cavalli-Sforza who I mentioned earlier as one of the pioneers of Developing phylogenetic methods by this time he was at Stanford And he carried out a very ambitious research program traveling around the world obtaining samples from people and Studying them using phylogenetic methods including mitochondrial DNA Y-chromosomal DNA and DNA from the rest of the genome And he also was a pioneer in Comparing and contrasting his genetic findings with what could be found Through the study of linguistics and through the study of cultures and so on and so forth And he he wrote a very important book. I think I came out in 1994 That really captured the state of the field at that time I'm not going to go through these individual papers, but I'm going to instead give you a summary of About what was known about human evolution Around 2000 this is actually taken from a review article by Cavalli-Sforza and his colleague Mark Feldman from 2003 So at this time roughly 15 years ago It was essentially established To their to their best guess using the data they had available that Anatomically modern humans had emerged probably in East Africa, although there were some that argued for South Africa around 200,000 years ago and That by about a hundred thousand years ago These groups had begun to split and spread out across the African continent and give rise to the different African populations that we see today for example the Bantu of Northern and Western Africa and the Saan of Southern Africa And then by around 60 or 70,000 years ago one or more waves of migration occurred off of the African continent These early humans began to populate the rest of the world through several different paths There was at least one southern Migration to the east at least one northern migration to the east and at least one migration to the west They're quite early remains in Australia going back as as long as 60,000 years ago and there are Remains in China of anatomically modern humans that also go back 60,000 years. So these are quite early Colonizations the evidence in Europe was for us a slightly later colonization about 40,000 years ago And then of course the population of the New World was considerably later required crossing the Bering land bridge Probably 15 to 20,000 years ago And again this appears from subsequent work to have occurred in multiple waves rather than in one wave of colonization All right, so this was essentially what was known at that time And then around 2008 or so This game really began to change dramatically and it really changed because of DNA sequencing technologies So a new type of technology for obtaining DNA sequences very very cheaply and in very high volumes began to emerge in the mid 2000s and it became clear that we could start to obtain complete genome sequences from individuals across the globe and the Combination of this effort was a project called the thousand genomes project Which has now obtained very high quality complete genome sequences for several thousand Humans from multiple populations from from across the globe and as this became possible It became clear that we no longer had to restrict ourselves in these sorts of studies to mitochondrial DNA or y-chromosomal DNA We could study complete genome sequences for humans and use those to try to understand our evolutionary history All right So that sounds good More data is usually good But it turns out that in this case more data leads to some Significant complications and I'm going to try to give you a little bit of a sense for how this problem becomes more difficult When you look across the entire genome rather than looking say just at the mitochondrial genome or just at the y-chromosome Which are inherited as units y-chromosomes paternally and mitochondrial DNA maternally Okay So one issue is that we have two copies of every chromosome So if you look at my one of my genes say my hemoglobin gene I have a copy that I inherited from my mother and a copy that I inherited from my father and those copies have Different evolutionary histories in the same way that my mother and my father have different evolutionary histories so if we look at a collection of Individual chromosomes from modern-day individuals We're going to count backwards in time. So time zero is the present day Now we can think of each individual as having two tips in that tree, right? So the blue individual has a tip here and a tip there One is the maternal copy and one is the paternal copy of the particular gene that we're looking at Same for the green individual and same for the purple individual We can then trace backwards in time and build a phylogeny for all of those Individual chromosomes, but it's no longer at the level of individuals. It's now at the level of chromosomes Okay, so that's one complication if I build an evolutionary tree for a single gene in the genome I have to keep track of the fact that each individual has two copies of that gene When it gets really complicated is when we think about the problem of recombination So some of you might remember from your high school biology class and it's okay if you don't that when your When your cells go through a process called meiosis the process of cell division that leads to sperm and egg cells That the paternal and the maternal chromosomes Swap genetic material with one another So if this is the paternal and this is the maternal chromosomes They cross over and some material from the maternal chromosome ends up on the paternal chromosome and vice versa and that happens every generation on every chromosome essentially What that means is that over time? The different genes on a chromosome will have different evolutionary histories if I look at my hemoglobin gene It's going to have one evolutionary history a different one for my mother and for my father But one evolutionary history for each of those if I then go to my cytochrome C gene Because it's in a different location on the genome and things have been shuffled by the process of recombination It's going to have a different evolutionary history So at every position along the genome, I'll have a different tree describing the relationships among the chromosomes at that position Now this turns out to be good and bad It's bad and that it makes things very complicated to study when I try to reconstruct evolutionary trees from Population samples of humans I have to deal with this nasty problem of the tree changing as I go along the chromosome But it's good in that I'm actually sampling a much larger portion of my ancestry Remember with the y chromosome. I'm only looking down one lineage I'm looking at my father my father's father my father's father's father and so on I'm only looking down one lineage of all my possible ancestors similarly with mitochondrial genome In this case at every locus I'm sampling a different set of ancestors because things have been swapped around in different ways by this process of recombination So it potentially gives me a lot more information About my ancestry a lot more information about how long ago different populations might have diverged a lot more information about Gene flow between populations as we'll see in a moment and more information about how large Ancestral populations might have been So let me talk a little bit about this issue of gene flow because that's where I'm trying to take you with this whole Study just like the Huffington post said So imagine that we have two Completely genetically isolated populations Let's say they they they live on separate islands and they don't have any technology for getting between the islands And they diverge some number of generations ago that we'll call towel Now if I sample an individual from each of those populations at a single locus and I trace them back Then they're going to find some common ancestor and that common ancestor will vary From one position along the genome to the next because of historical recombination just as I was telling you But if it's true that those two populations have been completely isolated genetically Then it has to be at least as old as towel Right when I find their common ancestry when I trace back to their common ancestry to their common ancestor It has to be in this ancestral population before the two were were isolated from one another However, if there has been some gene flow between those two populations if some some of these guys have been Finding rafts and sneaking over to these guys right then I'm gonna I'm gonna have some places along the genome Where their common ancestry is younger than the split between the two populations All right So if I look across the genome at many different locations and I see that most of the ancestry is old But there's an occasional Position along the genome with very recent ancestry. That's a telltale sign of gene flow between two populations Right and that is essentially the signal that we look for when we study these ancient interbreeding events Okay All right, now I'm gonna have to start to skip over some details because the methods that we actually use get fairly complicated But I want to tell you at a high conceptual level essentially what we're doing So my group got interested in this problem about seven or eight years ago and we were We wanted to model this problem of finding common ancestry a long complete genomes Allowing for it to change for the patterns of ancestry to change from one position in the genome to the next So we set it up in the following way. We collect DNA sequences for many locations across the genome We have a representative one or more representatives of several populations We propose some branching relationship among those populations We can try several if we're not sure what it is But sometimes we have enough information from the fossil record that we have a pretty good idea of what that relationship is so for example, if these were Europeans and West Africans then these might be South Africans We know essentially from other studies that that's their general relationship with one another and Then using the computer we explore many many Population trees Consistent with the data across the genome and we adjust the parameters of this model the time since these populations diverged and the amounts of Gene flow between populations until they best fit the data We do that by exploring millions of these possible genealogies across tens of thousands of DNA sequences drawn from the genome And we make use of techniques drawn from statistical physics called Monte Carlo techniques that led us in a principled way Explore the space of possible genealogies and at the end of the day the computer gives us a model And it tells us which model best fits the data and how much confidence we have in the individual parameters of that model All right, and some of these genealogies will will involve gene flow Between populations and others won't and we can turn a knob There's a parameter that describes how much of that gene flow there is so we can test the possibility of gene flow or the possibility of not having gene flow Okay, so the reason we were particularly interested in this was we had some collaborators In about 2009 published in 2010 who obtained complete genome sequences from some southern African Representatives in particular we were interested in this complete genome sequence for a represent representative of this hunter-gatherer population from the Kalahari Desert known as the Khoisan or the saan and The early work by Cavalli Sforza and others had shown From mitochondrial DNA and why chromosomal DNA that the saan seemed to be a very early branching group Probably the earliest branching group of all living populations on earth today But the data was was very sparse and it was limited to paternal or maternal histories So we set out to see whether we could figure out how old this population was by using these Statistical sampling techniques across the entire genome. I think I forgot to tell you the name of our program the name of our program is G-Fox stands for generalized phylogenetic coalescent sampler So we wanted to apply G-Fox to these data and see what we could say about how old the saan were So the way we did this was we took at the time there were only a few complete genome sequences for multiple populations across across the globe, but we had a Korean individual a Han Chinese individual a European individual a West African Yerubin individual and a saan individual and we Assumed the following the tree that I'm showing here. This was based on Cavalli Sforza's data and other data We could also test alternative trees and make sure that this was the one that fit the data best and We allowed for gene flow between some of these populations and then we tried to see whether We could estimate how old these splits were between the different groups and we focused in particular on two splits the split Between the saan and the others that was the one I mentioned the very old one that we're most interested in and then the Split between the West African Yerubin's and all of the non-African populations and that would be a proxy for The time when these non-African groups migrated out of Africa and colonized the rest of the world That would give us a pretty good estimate of when that colonization event might have happened That's known as the out of Africa migration and what we came up with after after very careful analysis for many many days Was was the following estimates we estimated the age of the saan split to be about 200,000 years ago Now that's that's really pretty old. So that's as old as Allen Wilson's estimate of mitochondrial Eve So the saan according to our estimates go back About as far as mitochondrial Eve would go back. That's that's actually not surprising mitochondrial Eve is the maternal ancestor But the but for reasons I won't go into it's not too surprising that the maternal ancestor Would be close to the divergence time of that saan split. So that was encouraging Our estimate of the out of Africa event The African Eurasian divergence AE divergence was 70 to 80,000 years and that fit fairly well with Archaeological findings in the Middle East and with a number of other Arguments people had made on the basis of both genetic and archaeological evidence. So we were quite encouraged by these findings But they did indicate that the saan are really quite an old population So note that this this time is about three times as long ago as this time That meant that the divergence of this saan group in southern Africa was three times as old as the split between the West Africans and the Europeans. It's a very old Group there has been some gene flow Between the the West Africans and the South Africans and we can detect that in our framework But they've been remarkably isolated probably because of this hunter-gatherer lifestyle living in the desert And their their tendency not to mix with the farming populations nearby. I Just wanted to mention very briefly that we There was a recent study that came out just last week This is not yet published in a journal, but it came out on Cold Spring Harbour's pre-print server known as bio archive This is a group that Analyzed some similar data to the data we analyzed But they combined it with some ancient genomes some iron age Farming genomes and some stone age hunter-gatherer genomes ranging between 300 and 2,000 years old so these were these were remains that they dug up in South Africa Obtained DNA from these remains Sequenced that DNA and analyzed it together with modern-day genomes for a number of different populations And they actually ran our program G-Fox on these data and They also made use of their own method Which which analyzes only pairs of genomes together? I don't want to go into all the details of their study, but they're estimating that these that this date For the split of the sawn which are here and the other African populations Might be 260,000 years old or even older than that I have some questions about exactly how they did the analysis, so we'll see how that holds up when this paper is peer reviewed but it's reasonably consistent with ours and it's not surprising that with the With the acquisition of this ancient DNA the date might get pushed back even farther One other caveat I wanted to give here without going into a lot of detail is that this molecular clock I've been telling you about is actually kind of a fiction There actually isn't one molecular clock. There are many molecular clocks the rate at which mutations occur varies across human individuals and it varies quite considerably between males and females and Because of the process by with the different processes by which sperm and egg cells are generated It's age dependent in males and much less age dependent in females and what that means is that? Old males who become parents make a very Disproportional contribution to the numbers of mutations that occur in their offspring. That's one of the reasons why you see a paternal age effect in diseases like autism It's because of the higher accumulation of mutations in the sperm cells of older males Anyway, I didn't want to go into all the details here But I want to make the point that when we try to calibrate these dates and we try to use genetic data To estimate how old populations are we're using very crude averages Over mutation rates across humans and some of these factors have probably changed over time generation times may have changed Ratio of male and female ages at the time of reproduction are dependent on the culture in which these Reproduction is occurring and so on and so forth so That's one of the reasons why there's a lot of uncertainty about the precise dates that we get out of these genetic analyses Nonetheless, we can be fairly confident about ballpark estimates Okay, in the few minutes that I have left. I want to start to talk a little bit about Neanderthals And I want to start by introducing you to Svante Pabo Who is probably the the most famous person in the field of Neanderthal genetics He's been here at Coltspring Harbor many times given many talks about almost always about Neanderthal genetics not always but almost always and Svante has been Fascinated with ancient DNA for decades and has really dedicated Most of his career as a scientist to devising new techniques for obtaining DNA from ancient samples Correcting errors in that DNA and then analyzing that DNA to tell us something about our history I mentioned that he worked early in his career with Ellen Wilson at at Berkeley later on he moved back to Europe and for a Couple of decades now, I think he's had his own Institute in Germany in Leipzig Germany Max Planck Institute Where they do some of the the world's best work in this field of ancient DNA So Svante had been studying Neanderthal DNA for a number of years and had some initial progress in obtaining mitochondrial DNA from Neanderthals, but also some setbacks there had been some high-profile cases where they had published What they thought was Neanderthal DNA that turned out to be contaminated by modern human DNA It's very difficult to avoid that sort of contamination And he went back to the drawing board and came up with more rigorous techniques for obtaining DNA and then finally in 2010 his team had a major breakthrough. They were able to obtain a so-called draft Neanderthal DNA sequence for an entire genome Now at this at this point they were not able to sequence too high coverage the genome of a single individual they had to combine DNA from three bones that were found in a single cave in Croatia They compared it with samples that they had found in some other caves across Europe But by combining this information and being very careful about DNA extraction and about sequencing and about error correction They were able to obtain Quite good draft quality genome for a Neanderthal and then they said about analyzing that genome and the big story from this analysis Was that there appeared to be strong evidence that Neanderthals and modern humans had interbred? Probably about 60,000 years ago I'm not going to go through all of the evidence that they presented in favor of this hypothesis But I want to show you one finding that I think is quite striking And and fairly easy to understand if you'll bear with me for a moment So what we're showing here is on the x-axis We're show we're going to take two genome sequences a European genome sequence and an African genome sequence and we're going to compare them to the newly sequenced the Neanderthal genome on the x-axis And to the human reference genome on the y-axis now the human reference genome is predominantly composed of DNA from Europeans But it's not the same European as the one we're comparing so there's still going to be quite a few differences between the DNA the European genome that we're using as a query and this human reference genome and Now what they do for this plot is they normalize They standardize the distances so they have an average of one so there are some overall differences Between the European and the African and how similar they are to these two reference genomes But they're going to get rid of that by adjusting them so they have averages of one now What you see when you look across the genome is that both the European and the African Mostly have a positive slope here where they're more where they're farther away from the Neanderthal genome they're also farther away from the human reference genome and that just reflects the fact that the clock ticks at different rates Different places across the genome so you're accumulating mutations at different rates at different positions across the genome And when the clock ticks faster you tend to be more distant both from the Neanderthal genome And from the human reference and when it ticks more slowly you tend to be closer to both but Look at this strange anomaly down at the left-hand side in the European genome So this is a collection of sequences a small fraction of the entire genome, but a significant fraction a collection of positions across the genome that are very close to the Neanderthal genome to the sequence the Neanderthal genome and very far from the human reference Okay sequences that look a lot like Neanderthal sequences, but are in a European individual And don't look anything like the reference genome. That's composed of a collection of different people So these are sort of anomalous sequences. It's like alien DNA Embedded in this European genome that looks a lot like Neanderthal sequences and not like other European sequences and It only appears in Europeans. You don't see it in Africans It's a very strange observation and if you do this plot with other populations from outside of Africa such as East Asians or Americans or Papua New Guineans you see the same sort of pattern a Small fraction of sites that look a lot like the Neanderthal DNA in humans All right, so I'm not going to show you the other analyses that they did but through a whole series of analyses a Large team of researchers very convincingly showed that the only plausible explanation for this strange observation in non-African genomes is that non non-Africans interbred with Neanderthals probably about 60,000 years ago After they had migrated off of the African continent We know that it can't have happened in Africa because we see no sign of it Among African populations. We also see no fossil record of Neanderthals in Africa So Neanderthal the Neanderthal range was predominantly in Europe the Middle East and Western Asia So it would make sense that this band that migrated off of the African continent would have encountered Neanderthals somewhere in Eurasia and The only way we can explain this strange observation and a fraction of their genome is if there was an interbreeding event Okay, so I want to go on with the story so the next chapter in this story Was the discovery of a new cave? So this the sampling of ancient DNA that spongebob and his team were doing was very much limited by the quality of The DNA they were able to obtain from these these bone fragments They were analyzing many of the bone fragments that they found that appeared to be Neanderthal bone fragments They couldn't extract any DNA from and even the best ones were maybe one or two percent Neanderthal DNA and mostly bacterial DNA and contamination from modern humans But then they found this beautiful cave in Siberia in the Altai Mountains Called denis of a cave and they teamed up with some Russian archaeologists and began to Explore some bones in that cave and found that they were sorry here it is It's quite far to the east of these European Neanderthal Findings probably on the eastern side of the Neanderthal range But they found some beautiful bones in this cave that had Astronomically higher Enrichments for Neanderthal DNA than anything they had seen before So they found in particular this one very tiny finger bone This is the distal manual phalanx. So it's the tiny little fingertip bone That that had a very good DNA sample and when they obtained the DNA from this sample they came up with The amazing finding that it appeared Not to be a Neanderthal it appeared to be another type of archaic hominin So it was it was closer to a Neanderthal than it was to a modern human, but it was Divergent enough from a Neanderthal that it must have been hundreds of thousands of years Separated from Neanderthals so they called that a new subspecies or species the Denisiva named after the cave and they also found in the same cave a toe bone Probably from the fourth or fifth toe That was very rich in Neanderthal DNA so these two samples then became the source of The next several years of analysis of ancient DNA they both were High enough quality that it was possible to obtain very high quality complete genome sequences For a Denisivan and for another Neanderthal from these two tiny bones Excuse me in this cave All right, so I can't go through all of the findings from the analysis of these of these two bones But I want to I want to show you a summary of what was known in about 2013 after the analysis of the complete genome sequences from these two bones So first of all you see there are two distinct groups the Denisavans and the Neanderthals They are more closely related to each other than either one is to modern humans But they're pretty distantly related to one another They probably diverged hundreds of thousands of years ago from one another In addition There was now evidence for several different gene flow events There's the one that I just told you about from a Neanderthal into these out of Africa populations represented by this line here Right here are Africans and here are non-Africans Modern humans that event must have happened somewhere in the branch leading to the non-Africans in addition, they found evidence of Gene flow from the Denisivan Into modern humans as well this evidence Appears to be combined confined to East Asia. It's most strongly observed in oceanic populations such as Papua New Guineans But you see some hints of it as well in Han Chinese and Korean populations This appears to be the result of a distinct interbreeding event between these Denisivan individuals and a group that was probably on its way migrating along the way to Southeast Asia in addition there was Some weak signal indicating gene flow between the Denisavans and the Neanderthals and then perhaps most interestingly there was a Assign this remains a mystery something that we're interested in working on in my group there remains a sign of some as yet unknown hominin possibly Homo erectus, which is a much earlier Group that is known to have lived in China and across Eurasia That group has left Some segments in the Denisivan genome that appear very strange relative to the rest of the genome So there are short segments in the Denisivan genome that don't look like anything else that we've sequenced essentially And it's possible that that represents another Integression event another interbreeding event a very old one, but that remains an open question Okay, so this is all a background to the story that I'm going to tell you about from my group very briefly and There this story involved using this program that I just told you about G Fox to jointly analyze all of the data that was available at this time So we had now three Neanderthal genome sequences The ones from the first paper the ones from the second paper and then a partial genome that had not yet been published from a cave in Spain we had the Denisivan genome, and then we had a series of Modern humans whose genomes had been obtained. I'm using the Yerubin here as a placeholder, but we analyzed several of them together We put them into this g-fox program that samples over all of the possible evolutionary histories that could explain the data and After some careful analysis, we came up with the following model. So g-fox detected evidence of essentially all of the Gene flow events that I just told you about so for example Here's the gene flow event from the and or tolls to the out of Africa populations Here's here are the gene flow events from Denisvins to East Asians and Papua New Guineans Here is that mysterious archaic hominin that might be Homo erectus introgression Here is the Introgression between the Neanderthals and the Denisvins detected at quite low levels but in addition We found another Introgression event and no matter how we did the analysis no matter how careful we were no matter how we Subsetted the data we couldn't get this one to go away and this one is quite interesting. It's going in the opposite direction it suggests some Early modern human from before the divergence of Europeans and Africans Left its imprint in the Neanderthal genomes Remember the the event I told you about earlier was in the opposite direction It was Neanderthals leaving a footprint in out of Africa human genomes This is humans leaving a footprint in Neanderthal genomes, but it's shared across all humans It's not present just in the out of Africa populations. It's you see the same signal Essentially symmetrically in all modern humans So it must date to it to a time before the divergence of these human populations And it appears only in this eastern most Neanderthal genome the Altai Neanderthal genome So this is really a mystery. How can we explain this? Observation well, here's our best guess at coming up with a scenario that might describe it So first of all if we if we think about the human lineage about 600,000 years ago In Africa the Neanderthals would have branched off and they would have migrated off of the African continent This is very early Sometime later around 200,000 years ago just before These different African groups began to split apart from one another the Sun and the West Africans for example There must have been a group that Interbred with Neanderthals Now the question is where could that have happened because we don't think Neanderthals at that stage Lived on the African continent. So it suggests maybe there was an earlier migration out of Africa an interbreeding event with Neanderthals perhaps in the Middle East or east of the Caspian Sea leading to that eastern most Neanderthal lineage and Then who knows what happened to that group of modern humans that group of early modern humans We don't see any representatives of them alive today But they could have been absorbed by the Neanderthals They could have died out completely or they could have migrated back and become absorbed by the other African populations We don't know we just know that we see no sign of them and then some time later Going back to about 65,000 years or so there would have been the main migration out of Africa the so-called out of Africa event and subsequently the Interbreeding event that had already been discovered by Sfante Pablo and his colleagues in the opposite direction From the andretals into modern humans. So this was the subject of our paper a couple of years ago There are a lot of questions about exactly how this could have happened But the genetic evidence is very strong that there was at least one interbreeding event in the other direction From early modern humans into Neanderthals Okay, so I apologize for going long. I'm going to wrap up there the Main point I want to make is that we can take use of the we can make use of these classical molecular phylogenetic techniques To study complete genome sequences and reconstruct human history It's computationally expensive requires supercomputers and very sophisticated computational models But we can do it and we can come up with new discoveries Including these ancient interbreeding events The the other point I wanted to make is that Simultaneously modeling all of the data gives us a lot of useful information So most of the previous work published by Sfante Pablo and others has looked at subsets of the data in isolation This finding that we were able to publish a year ago Was made possible by the fact that we we came up with a single model that had to explain all of the data together And we could only see evidence of this Early interbreeding event in the opposite direction from early modern humans into Neanderthals After we were accounting for all of the signals of the other migration events It was only by building a holistic model that described all of the data together that we were able to discover that event And as I mentioned we've we've found the first evidence of early modern human gene flow into Neanderthals and they Suggest a likely possibility of an earlier migration out of Africa Although we have no other evidence to support that finding other than the timing the inferred timing of the event So finally what's next what we're very interested in understanding that? sort of phantom Integression event in the Denisovan genome those hints of some very early Integression event possibly from Homo erectus. I have a student in my lab who's working very hard on Trying to build models that can detect those early events We're also very interested in coming up with ways of detecting specific Integressed segments specific segments in the human genome that have come from the Neanderthals and Denisovans And I didn't get a chance to talk about it But it's very interesting to think about the possible disease causing mutations that are Out there in modern human populations that may have been inherited from Neanderthals Because these Neanderthals had adapted to a different genetic background. They had adapted much earlier to the climate and conditions of Northern Europe and Asia and Those mutations that they passed to modern humans through this integration event some of them were probably Advantages, but some of them were probably disease causing mutations So there's a lot of interest in trying to understand which mutations now linked to disease might have come into our populations through these Integression events Okay, I'm going to stop there that like to thank all the members of my lab who have contributed to this work as well as our collaborators and I've have a number of funding sources over the years who've allowed us to pursue these sorts of questions. Thank you very much