 Thanks, Elliott, and thanks very much, Eric, for having me here. Eric Green, that is, and thanks Eric Lander for the call in 1999 to work on the genome project. And, of course, thanks Jim Kent for doing all the work. I'm going to say a few words about evolutionary genomics here. And following in Evan's footsteps here, I'll even tell you about a gene. We have no idea what the function is, but it's rapidly evolved. The comparative genomics is the current version. I think we're progressing towards a true evolutionary genomics. We don't want to just compare genomes pairwise, but we want to compare them to each other in such a way that we understand the origin of the DNA in each of the species. And that's the theme here. Right now on the genome browser at UCSE, you can look and you can see the comparison of the human genome to 27 other vertebra genomes at this point. And this is a huge testament to NHGRI's sequencing prowess and tremendous achievements that have come out of these sequencing projects. And as we've heard from previous speakers, if you go through and just look at regions of conservation, things like exons and conservative non-coding regions stand out quite distinctly. Minolas-Kellis led a great comparative analysis of the fly, and we heard about this from Andy, where I want to add to... This is the same slide I've copped from that paper. My postdoc, Jaco Pedersen, also worked on this project and working on another aspect, which I'll add to Andy's comments, that you can recognize RNA structure also by comparative genomics, and that comes from finding compensatory changes where one side of the RNA helix change and the other changes to compensate. You can have very strong statistical evidence from that, and Jaco did that both in human and in this fly study. And of course, Elliott did an amazing job in leading the comparative genomics for the ENCODE project. I think this was a huge success. We learned a lot from intensely looking at 1% of the genome, and the comparative part was a great foundation for that. It was a lot of that. If it wasn't explaining it, everybody was scratching their heads, and why doesn't explain the experimental data? So the interaction between the conservation that Elliott's group put together and the rest of the experimental data is a very fertile place to look for genome function. So evolution is... What we learned from that is that evolution is much more sensitive than a lot of the experiments that we can do. Evolution can sense things in terms of subtle but statistically significant advantages, fitness advantages that are impossible to test in the laboratory. And we've seen that time and time again, and I think it will be an emerging theme over the next decade. We've heard a lot, again, and you went over this, and I'm just going to give you a very high-level picture. If you look at a particular region of the genome and you focus in, you can see just looking at a small subset of these 28 vertebrate genomes that we're now dealing with, you can see the various forces of evolution at work. In this third position of the codon, you can see that there have been enough mutational opportunities and the right kind of populations and timeframes so that it has explored its potential. In other words, we've seen it try out different forms, and in fact, it doesn't matter. We're going to make alanine whatever this third base position is, and so you do see that drift. Neutral drift is evident and is measurable, and so once we have enough data to see that clearly, then we can contrast that to events where there's clear negative selection. So the first two base pairs of this same codon show this evidence of negative selection, just as many mutations, but they did not get fixed in the population because they confer a negative fitness. Now, what's exciting is on the background of that, if you look at regions that are clearly under selection for hundreds of millions of years and then suddenly change in the human genome, then we have something interesting, and I love that Rick Myers also pointed out this example. This is Monaco's gene, FoxP2, and Swantipavo's group, this NRDAL paper, showed that it had a very interesting evolutionary history in the sense that there were two amino acid substitutions in a gene that hadn't really had many amino acid substitutions for hundreds of millions of years, and of course this gene is required for the production of speech. So the suggestion is that these are possibly events in the positive selection for speech in our species, and by other means they have been dated to an appropriate period of our evolutionary history. Now, of course, that doesn't mean that we have any proof that this is a speech gene. Again, as Rick mentioned, Swantipavo has made transgenic mice with humanized versions of this gene and he's listening to the mice to see if there's any difference and of course there's no definitive conclusion from that. So one of the lessons that we get in this is that it's very difficult to test, especially human-specific or primate-specific changes because our primate models are precious and of course human-specific is impossible to do any experiment with. So also the complex connection between genotype and phenotype requires an immense amount of study. So with that in mind, as an excuse for showing you a gene that I really know nothing about functionally, I will go through what we did. We scanned, and this is, we are usually the people that are shown here at the bottom in the credit side. I'm not going to mention them all by name, but this was Katie Pollard who's now at University of California at Davis who scanned the human genome for instances of this signature of positive selection in a region that was previously under negative selection. And so here you have a region that showed as the most dramatic example of sudden positive exception, sudden positive selection after long negative selection, which we call human acceleration. Human Accelerated Region 1 is 118 bases that had only one change between chicken and chimp if you went pair-wise, and overall was incredibly conserved for about 300 million years and then suddenly in 118 bases, bam, 18 changes in the human lineage. That indicates that something was going on to shake this region up. We found 49 such regions that were significant at a 5% false discovery rate. And looking at this top region, we can see that in fact it corresponds to a previously unexplored gene on chromosome 20. Now, all the wet lab work was done by Sophie Salama, so she mapped this gene out for the first time, found out that there's actually a gene on the forward strand and one on the reverse strand, and they overlap just at this interesting region. Now, it turns out that this gene doesn't make a protein at all. It makes an RNA, so this returns to my previous example, and it turns out that Jaco Pedersen, again who I mentioned earlier, independently found this gene on a completely separate whole genome screen where he was looking for regions that showed the most dramatic compensatory substitution pattern in the genome. This is in the top 1%, and was very close to the list. If you take into account conservation, it actually boils very close to the very top of the list. And so what's happening is we can get now a confirmation that this change wasn't just a speed-up in mutation, but there must have been some kind of selective force here going along with the mutations because if you run a program to predict the RNA structure and then look at how that RNA structure has changed, you find 10 positions are changed in the human genome within the RNA helices, and they form five perfectly compensatory changed pairs. So both base pairs have changed in order to maintain base pairing in what was originally these helices of the genome. Chances of that occurring is 10 to the minus 9 by coincidence. So you see the hand of selection clearly in this gene. And in fact, what's happened is that one of the helices has also altered its structure. So this helisk is now extended at the expense of this helisk. So we also have a slightly new structure in the human. These have been confirmed by dimethyl sulfate probing. I won't go into the experimental details, but basically this dark patch here comparing the two different RNAs that we've made in large quantities and tested now shows that this position here where the dot is is relatively exposed near the end of the helix and chimp, but it's nicely buried in the middle of the helix and so it's resistant to the dimethyl sulfate treatment in human. Sorry, so you get a lot of hits in chimp and resistant in human backwards. And so that establishes essentially the correctness of this structure prediction that was already evident from an evolutionary point of view. Now the exciting thing that happened in this and the reason it comes back to our attention here in terms of possibly important genes for human evolution is that Pierre Vonderhagen showed up, our Belgian colleague, with tissues where we could take the clones that Sophie had prepared and actually do in-situ hybridization in human developmental neural tissue. And so you're looking actually at in-situ hybridization, in this case in human 17 gestational week fetus, and you can see that in fact this gene is expressed in specific layers in the developing cortex. This is the subpeal-granule layer and this is the peal surface up here at the top, it's the marginal zone, and this is the evolving cortical plate. So it's expressed very close to the surface in a specific set of patterns on either side of the marginal zone subpeal-granule layer. And this is recapitulated in the macaque at a comparable time in development. And what's interesting is there's another very important neurodevelopmental gene that's being expressed at this time, the gene relin, and you can see that in immunohistochemistry there next to it. So what's going on during about nine to 20 weeks, 24 weeks of gestation, the six-layer structure of the cerebral cortex is developing and it's being created at that time there are migrating neurons coming radially up from the subventricular zone and they are sensing a gradient of relin that's produced by these caholrezius neurons and some triangular-shaped neurons that are up here near the peal surface are creating this gradient of relin and based on sensing the relin concentration these migrating neurons get off and form the various layers of the cortex. In fact there are six waves of migration at this time according to the relin master clock and that forms the brain. So, needless to say, we were extremely excited when we looked more carefully at the combined immunohistochemistry for relin and the in-situ hybridization and we can see that in fact the R1F, this rapidly accelerated gene comes on at about the same time in human cerebral cortex development. It's expressed specifically in the same cells the caholrezius cells as relin and it actually goes off when relin goes off and so we have strong circumstantial evidence that this rapidly changed human gene may have something to do with the development of the cerebral cortex which makes it a very exciting evolutionary candidate and so we're pursuing this I emphasize hypothesis that we have no direct connection at this point it is all circumstantial but it is a tantalizing hypothesis that this gene may have been involved in human evolution. One thing that happened that you can see here on this radiograph comparing the chimpanzee to the human obviously is that the cerebral cortex expanded by about three-fold during the time of evolution that we're talking about since divergence with the human divergence from human and chimp. Now we have no direct evidence that this gene had anything to do with that but it shows you how we can go from a whole genome scan down to a hypothesis with amazing ease now and I think we're entering into a transformative period in the study of human evolution where we can start to use our genetic tools to look deeper and in a non-biased non-hypothesis-driven way. Well, I'll just provide a couple other quick stories here many of you know of course that the human genome is mostly so-called fossil junk left over from transposons mostly retro transposons that are shown, their life cycle is shown here that copy themselves and spread throughout the genome. It's likely that much more than half of it is and they're just, the families are too old for us to recognize at this point. We use these retro transposon families as a model of neutral evolution to kind of have a yard stick on which to measure the rate, empirically measure the rate of neutral evolution and comparing to that yard stick if you look through 50 base pair windows of the human genome compared to the orthologous region in mouse you see an excess of highly conserved regions and that was led to an estimate that about 5% of the human genome shows with comparison to the rodents signs of purifying selection and that's held up fairly well in subsequent studies with many more species and much more care. So it's clear that there is a reasonable percentage of our genome that is devoted to some kind of function which is under purifying selection. It's much more than is under coding. You know, of course, only less than 1.5% of the genome codes for protein. So at least twice as much as what codes for protein is under selection for something else and we assume that it's regulatory of some kind but most of this is unexplored so-called dark matter. Now the regions of strong conservation often cluster near developmental genes and that's a very, very strong pattern. Recognizably similar sequences to these highly conserved vertebrate sequences are not found in invertebrates and so there's strong evidence that this may be an independent evolution of regulatory architecture in the vertebrates. Some of the stretches are so conserved that they're virtually unchanged for hundreds of millions of years. We call these ultra-conserved and Gil Begerano discovered these and is working on them. He's now at Stanford. And Eddie Rubin's group, in particular Len Panasho, had a wonderful paper recently where he's analyzed hundreds and hundreds of these and showed them in assays as distal enhancers but then knocked out, not in his group knocked out four of them and didn't see a phenotype. And so they exert clear effects when you do an enhancer assay and oftentimes we see very, very strongly recapitulated very, very precise expression patterns that are under the control of these enhancers so it's clear they're doing something but it appears they're in largely redundant systems and so knocking out a single one doesn't have much measurable effect. And one of the things that I'd like to spend some time on but don't, but recent evidence indicates that many of these were actually propagated from one position to another by transposons so we can trace the origin back to ancient transposons that operated and moved this material around. And if you think about the numbers game, it's not surprising. We have hundreds of thousands of these elements. There's no reasonable duplication process. Segmental duplication is great and I think Evan has highlighted some spectacular phenomenon associated with that but it's not pervasive enough to have created these massive networks all throughout the genome, a regulation. So we, there's strong evidence from several groups including Landers Group, the Elgar Lab, Okada's Lab that we have a system that was mitigated by ancient transposons. Now, there was always the question that well, if you're just looking at evolutionary conservation between species, how do we know these aren't mutational cold spots? And again, following on Andy's talk, the way to do that is to look at the population dynamics of these today and so if you look at derived allele frequency which is the frequency that new alleles are segregating within the population, the cold spot problem because we're only looking at the alleles that are segregating and among those alleles we're asking do they get to high frequency? Do these new or derived alleles get to high frequency in the population? So here's a histogram for the frequencies of segregating alleles, 134 segregating derived alleles in the, in the, alleles in the elder conserved sites and doing this work with Rick Wilson's laboratory we found out that in fact it's very rare only one of them got to more than 50%. You see a remarkable lack of high frequency derived alleles and this contrast with non-sononymous changes in coding regions where they do come to high frequency and fix with the reasonable frequency. So if you convert this raw frequency spectrum into an estimate of the selection coefficients for these sites, the average selection coefficient is about three times as strong as that in the non-sononymous amino acid changing position. So that is dramatic evidence that these are actually under selection today even if NADF can knock out the damn things and not see any phenotype. So in closing we have a grand challenge here to reconstruct the evolutionary history of the human genome and use it as a guide to understand processes that are relevant to our evolution in particular and Evan made a spectacular case I think for the recently evolving regions of being the most exciting and also having a great connection with human disease. The right way to do this is not pairwise comparison but to actually reconstruct the history as Evan indicated and we think we can do that with very high accuracy back about 100 million years and for most of the genome, maybe 90% of the genome and then in the ultra conserved regions we can go back a lot further and all of the genome centers are creating data for this. What we would like to be able to do is to actually model not just the single base changes but the entire evolution of the genome, the chromosomal changes that you see, the segmental duplications that Evan studies, the reciprocal translocations, the speciation events that result in independent changes that we can then trace back to a common ancestor. The problem of putting together all of this data is enormous and we have been very much wrapped up in it for three or four years. We're working very hard with Webb Miller's lab and looking at the wonderful data that's been generated by all of the sequencing centers. You can see with this density of sampling from the placental mammals that we have a good shot, and we can prove this mathematically, a very good shot at producing a very accurate history from this ancestor forward for most of the genome. If you go back further to the marsupials and so forth, there's a long branch in there that's going to be difficult. There are going to be a lot of ambiguities. It's going to be difficult to reconstruct. But here there's enough semi-independent data bearing on this, say, Boreo-Eutherian ancestor to give us a good, accurate reconstruction. And it will look something like this. This is an early model where we're looking at evolutionarily non-rearranged segments. We call them atoms, non-rearranged and non-partially duplicated segments. Here we broke up chromosome X into 200 such segments and we're looking at its evolution. And I want to point out that these just aren't, it isn't just inversions that you're seeing here and translocations, but that little line is a duplication. And so we've estimated that there was a duplication event and we've traced the two copies as well. And so this is actually a new thing and we want to be able to do this on a large scale to be able to analyze rearrangements and duplications at the same time. I think that is the key. If we don't get there, if we're always doing this independently, we're really not understanding the correct substrate for evolution. We're really not putting together the correct story. Mathematically, this has turned out to be enormously, enormously challenging. Of course, what we want to do is we want to have the high-level picture, but we want to be able to zoom down in and look at this region like in HAR-1 where you have an extraordinary evolutionary history, and this is just a little snapshot of dating some of the events that happened along there. And of course, we'd love to get Swansea Pabos or Eddie Rubin's Neanderthal sequence to fill in those question marks at that line. So the grand challenge I see from evolutionary genomics is to reconstruct the history of every base in the human genome with as great accuracy as you can say when you don't know, the answer indicates the level of uncertainty. From that, we can recognize functional elements by the signs of negative selection and look for positive selection that identify the origins of evolutionary innovations that are specific to the human lineage. Thanks to the group, we have Sophie Salama leading the wet lab, and this is now the wet lab. At this point, it's grown very rapidly, and they've done a great job starting to explore some of the hypothesis that we've generated by our bioinformatics work. And of course, Jim Kent is leading a very large group of browser people, and then there are a number of students and postdocs, former postdocs that I have to acknowledge that contributed to this work. Katie Pollard at UC Davis, Gillette Stampert and Adam at Cornell have done a great job in laying the foundation for this work, and we have a large number of external collaborators, all of the genome people, and that's the whole list of everybody. Thank you. We have time for a few questions while we get a presentation loaded. And by the way, there's plenty of seats towards the front, so those of you who are trickling in the back, come on down. You can even sit next to the NIH director if you want. There's an open seat next to him. Okay, over here, Karen. David, we're friends, so I can say this. As an alum of Solarogenomics, we cursed the day that phone call was made in 1999. We're all still trying to get over it. I am really excited about the dark matter sort of work that you're doing, but I wanted to go back to a previous part of your talk where you were talking about the annotated genes that we have already with no functional attribution, and that to me seems like a more embarrassing problem for us. Do you see that as something that's amenable to a breakthrough anytime soon? That's very difficult from the point of view of evolutionary history. We can identify those with unique evolutionary history, but it's a long way. There's a long experimental road, I think, to really getting down to what we would be comfortable of calling an identified function. Elliot? So every once in a while, I try and wrap my head around how an ultra-conserved element can be so conserved. These regions are more conserved than protein-coding genes, and when we think about what we know about binding sites, which can have some wiggle room in the various bases, it's sometimes more of a bar room conversation than anything else, but I'm curious of your thoughts on how something like that can be so conserved at every base for so many millions of years. Every so often, I scratch my head about that too. No, that's a very difficult unsolved riddle at this point. Are there multiple overlapping binding sites, some kind of promiscuous binding that we don't completely understand, that governs these enhancers? There's some kind of subtle quantitative effect that's being selected for, possibly. Lots of competing binding sites, maybe dual function as an RNA or something else. It's still wide open. Francis? So, when we're trying to figure out what are the DNA sequences that are particularly responsible for very recent human evolution, it is, of course, tempting to look at the things that are gains of function. But obviously, the other possibility, and one that Maynard Olsen, in particular, our Oracle of Delphi, when it comes to the genome, has always put forward is that it probably won't turn out that it will be gains, it will be losses. You have something that was holding you back and you got down out of the trees onto the savanna, you didn't need it anymore, and your opportunities, an example is this MyH11 gene that's responsible for jaw muscles, and maybe with your jaw muscles or puny, you give your cranium a chance to expand and take up more space with brain. So, is there a systematic kind of approach to that question in terms of what things have become non-functional? Is there a story there, or is there just a bunch of anecdotes? Well, we're starting to get a systematic, and thanks for asking that question. We just have a paper that will be coming out soon on the screen on that subject, and I'll send you a draft of it happily. But yes, of course you can, and I think loss is extremely important, and we've found a few more, but I think we're still a ways from being systematic about it because it greatly enriches for any artifacts in our assembly or matching algorithms and so forth, and so it's hard work to sort through that and make sure that it isn't a gap or a glitch or something that you've introduced. And so that's still, there's a labor-intensive part of that, but it's coming.