 On April 14th, 2003, the founding directors of the National Human Genome Research Institute, Dr. James Watson and his successor, Dr. Francis Collins, declared during a press conference that the Human Genome Project had reached its goal of sequencing the first complete reference sequence of the human genome and that the project was officially over. At the same time, the genome institute published a strategic plan that laid out an ambitious research agenda for the field. My name is Larry Thompson and with me is Dr. Eric Green, the current director of the National Human Genome Research Institute and one of the co-authors of that research agenda, which was published in Nature. So Dr. Green, you know, a decade ago NHGRI laid out a lot of audacious big challenges for the whole field of genomics. So I'd like to ask you about some of those and how we've been doing now. Let's start with the really basic stuff, which is where the plan itself started. We wanted the field needed to sort of know, now that we've got the sequence, you know, what's in there? What are the parts of the genome and how do they work? Yeah, so the genome project, of course, just ordered the letters, but didn't tell us what their meaning was. Many efforts have been pursuing that for the past decade. At NHGRI, we launched the ENCODE project, Encyclopedia for DNA Elements Project. ENCODE has revealed over and over again many surprises, all sorts of information about how much of our genome actually gets made into RNA and lots of questions and some answers about what that RNA might be doing, lots of information about all the different parts of non-coding DNA that get bound by different proteins that are probably playing a role in choreographing the expression of genes in different tissues at different times and just remarkable insights about how complicated the human genome really is. Probably about one and a half percent of the genome accounts for the protein coding areas, but there's this other five to eight percent that's highly conserved. Across all mammals. So yeah, so what's going on with that? So that was a surprise, and when the insights originally came out on that, when we sequenced the mouse genome, immediately following sequence in the human genome, there were some that sort of almost couldn't believe what they were seeing, and then you added the rat genome and then the dog genome and others, and that number has held up, you know, something on the order of five to eight percent. Now we've done analyses of almost several dozen mammalian species, and there seems to be sort of a core set of sequences that are highly, highly conserved across virtually all mammals. And as you pointed out, the minority of those are protein coding regions, and many of those non-protein coding regions has been conserved through evolution as hard as tight, almost a grip on them as much as the protein coding sequence. We're at the, we're 10 years after the completion of the human genome. How many genes are there, and how has the definition of a gene evolved over this time? Ten years later, that number's pretty much settled down into about 20,000, and you know, their plus or minus, maybe a few hundred people argue around the edges. The other thing that I think has evolved over the past decade is the definition of a gene. What does a gene mean? Once upon a time, we believed in the central dogma, it was DNA made RNA and RNA made protein, and the definition of a gene was that segment of DNA that contained the information for making a protein. But that was before we realized that there's more complexity there, because DNA can make RNA and RNA can go off and do all sorts of important biological things other than making proteins. One of the other things that was really clear at the end of the human genome project was that there was this very small percentage of human variation, maybe about one-tenth of one-percent was the estimate, and it seemed clear that within that small amount was the key to understanding why some people are at risk to disease and other people are not. So what do we do to go about trying to understand that level of human variation and what do we know now? Well, and of course the reason we wanted to know about variation is we weren't just interested in a hypothetical human genome sequence that had been generated by the human genome project, but eventually we wanted to analyze individuals' genome sequences, eventually perhaps as part of clinical care, and we were most going to be interested in the differences. But while we knew a rough estimate of maybe one in a thousand bases individuals differed, we also recognized that it's not that each of us have unique differences, but in fact across groups of people there's a lot of common variants that reside out there. And that became a bit of a finite problem if you just started sequencing or analyzing enough individuals' genomes you could develop catalogs of those variants and those knowledge of those variants could then be useful tools for pursuing studies to figure out which of those variants are biologically relevant, which ones are not biologically relevant. How has the field gone from identifying these variants across these populations and applying it in the clinic? It was reassuring when about seven or eight years ago the first demonstrated success story of a genome-wide association study was published in Science describing not only, they actually were quite fortunate that study, they actually got down to the gene that had a variant in it conferring risk for age-related macular degeneration. And that was about 2005, that first example, and wow, all you have to do is follow the literature ever since. There's just been just an avalanche, if you will, of publications describing successful genome-wide association studies. Over 1,400 publications since that time have been implicating hundreds and hundreds of regions of the human genome, at least at a statistical association level, with literally hundreds of different diseases or traits of interests. There's been clearly an investment by the Institute in technology. How has that transformed how this work can be done? It's been remarkable, but I think what's happened in the arena of DNA sequencing technology development past 10 years, almost precisely the past 10 years, has been truly spectacular. And we just go back 10 years, just to generate that first sequence of the human genome. And the active sequencing part took about six to eight years, consumed about a billion dollars. That was about the cost for the actual act of organizing for sequence, actually doing the sequencing. You know, 10 years ago when the genome project ended, if those same groups immediately would have produced a second sequence of the human genome, hypothetically, they estimated it would probably take them maybe three to five months to do instead of six to eight years, but it would still cost about 10 to 50 million dollars. But now fast forward 10 years after these spectacular new technologies have been developed, and we're well under $10,000. In fact, the current estimates for getting a sequence of a human genome, something on the order of three, four, $5,000 in route to $1,000, I think, within a year or two. And remarkably, you could do it today in a couple of days and probably by the end of this calendar year on being told, probably within a day. What can we now say about the genetics of diseases? The real success stories in terms of really reaching the finish line with respect to understanding the genomic basis of disease so far have been limited almost exclusively to rare diseases, not the complex diseases that have multiple genetic components. But what's going on with rare genetic diseases has been truly remarkable. When the genome project began, we knew the genetic basis of maybe on the order of 60 diseases that were caused by defects in a single gene. The genome project accelerated the pace at which we were able to discover those genes and those mutations, so that by the time the genome project ended, that number was about 2,200. Over the last 10 years, it's been more than doubled. We're almost up to 5,000 disorders that we now know the genetic basis. And we are already seeing, like FDA, is using genetic information and it's advising doctors how to use certain prescriptions more than 100. More than 100 now when the genome project began, there were only about four where we thought genetics really was at all relevant for the decision making about what drugs to give patients. And we are very confident that list is going to grow. Two years ago, the institute created with the help of the community sort of a new strategic plan sort of looking forward. So what's the future vision of where the field is going? The strategic plan two years ago broadens that and gets much more specific and thinking much more critically about that logical progression going from understanding the genome structure and understanding genome biology, and then a heavy emphasis on then using genomic approaches and knowledge to understand human disease, and then using that information to advance medical science, but then not stopping, recognizing there's a responsibility to continue to do studies to then use that knowledge about advancing medical science and studying it when you actually go to deliver it in a healthcare system, which often gets very complicated. And so it's sort of now that full view everything from bases of the genome to actual healthcare implementation of genomic approaches that is articulated in this current strategic plan that, you know, everything we think about really hangs off of in some way. Thank you, Dr. Green. The Nobel Prize winner Walter Gilbert famously calculated that it would take 15 years to sequence the human genome for the first time, but predicted that it would take another 100 years of basic biological research to understand it. The first 10 years appeared to have gotten off to a pretty good start. I'm Larry Thompson at the National Human Genome Research Institute.