 Okay. Good afternoon, everybody, and welcome to Clinical Center Grand Rounds. I think today promises to be one of our special Grand Rounds. Of course, they're all special, but this one will be extra special. Both of our speakers today are from the National Human Genome Research Institute, and we're pleased and very excited to welcome Eric Green, the Institute's Director, and Dr. Leslie B. Secker, who is Chief of the Genetic Diseases Research Branch. What I'll do is introduce each speaker, and then Dr. Green will give his talk. We'll have a couple of questions, and then Dr. B. Secker will talk. So Dr. Green is our first speaker, and he will present Sequencing Human Genomes Circuit 2010. Before being named the Institute's Director just a little over a year ago, he was the Institute's Scientific Director and headed the NIH Intramural Sequencing Center, where he led several major efforts using large-scale DNA sequencing to address important problems in genomics, genetics, and biomedicine. A graduate of the University of Wisconsin in Madison, Dr. Green earned his MD and PhD degrees at Washington University in St. Louis. He completed a residency in laboratory medicine and a postdoctoral research fellowship there. In 1994, he joined the newly established intramural program of the National Center for Human Genome Research at the NIH, later made an Institute. Dr. Green's memberships include the American Society for Clinical Investigation, the Association of American Physicians, he's also founding editor of Genome Research, and on the editorial board of Mameleon Genome. And since 2004 has been co-editor of the annual review of genomics and human genetics. What you don't capture from this short introduction is that he's really a tremendous pillar of our community, helping, for example, to lead the rejuvenation of our Office of Education, serving on numerous committees, and helping me on a number of initiatives such as getting some tools for genomic analysis here at the Clinical Center. Our second speaker is Dr. Leslie G. B. Secker, who will present the clinical annotation of genomics challenges and opportunities. He's a medical geneticist and has been chief of the Genetics Diseases Research Branch at NHGRI since 2006. And he directs several research projects, including one studying the clinical and molecular etiology of rare birth defects and syndromes, and another on the clinical application of human throughput sequencing to delineate common phenotypes. Dr. B. Secker received his BS degree at the University of California, Riverside, and his MD at the University of Illinois, Chicago College of Medicine. After an internship in residency in pediatrics at the University of Wisconsin hospitals and clinics, he completed a clinical fellowship in pediatric genetics at the University of Michigan. He later was a research fellow in pediatric genetics and medical hematology and oncology there, and he came to the NIH in 1993. He's a member of the American College of Medical Genetics. He's also a member of the ASCI and the American Society for Human Genetics, and he's on the editorial boards of clinical dysmorphology, gene tests, and BMC medicine. He's also deputy editor of the American Journal of Medical Genetics. Now please welcome Dr. Green, who will start this off. Thank you, John. It's a pleasure to be here. And I would immediately tell you that this is a bit of a repeat performance. Les and I gave a grand rounds here, actually at the beginning of 2009, and you may ask the question, why less than two years later would we be here doing essentially a similar tag team? And I think we're going to convince you the reason for that is the remarkable fast-moving aspect of genomics, especially in the area of genome sequencing, which is really the purpose of my talk to try to describe to you. And I think you'll be convinced by the end of this hour exactly why it is that we're back here less than two years later. I'm obligatory, a slide saying that fundamentally I'm a boring person and I have no interesting financial relationships. The objectives are summarized here that basically I'm going to tell you about advances in DNA sequencing technologies. They're changing the face of genomics. They're creating all new opportunities for sequencing human genomes, and as a result of that, these technologies are catapulting us into the era of genomic medicine. The origins of all of this is the Human Genome Project, an international endeavor that represents the flagship effort of the institute that I now lead. Remarkably, one of the milestones of the genome project occurred 10 years ago. I can hardly believe it's been 10 years, and it's been prominently featured in the literature this year. In December, this past December marked the 10th anniversary of the announcement of a draft sequence of the human genome, and come February expect another scientific blitz of media attention with the celebration of the 10th anniversary of the publication of the draft sequence of the human genome. That issue of nature is shown on the previous slide. I've been interviewed over and over again about this anniversary event. I'm sure this will happen again. In February, I'll just steal one of the quotes that nature ran in that issue on the previous slide, because I think it summarizes my whole view of this. That is that when the Human Genome Project was envisioned, and I remember that. I was a postdoctoral fellow and a clinical pathology resident. Scientific leaders of the day predicted that it would take 15 years to generate the first sequence and a century of biologists to understand it. What I basically was quoted as saying, I think they got it right. Well, we still don't have all the answers being a mere 10 percent of the way into the century with a human genome sequence in hand. We have learned an extraordinary number of things about the human genome and how it works, and how alterations in it confer risk for disease, and it's particularly a ladder that is so relevant to this institution and to the clinical center in particular. And the notion of the relevance of genomics for studying human disease and medicine is not new to now and 2010, rather either in the popular press or the scientific press, even when the genome project was out of the gates, but far from being completed. The idea that the combination of genomics and insights about the function of the human genome could be combined with the practice of medicine was a very real one and started to basically serve as one of the key rationales why we sequenced the genome in the first place. Now, there are many phrases that are used for this shown on this slide of genomic medicine, but other names for it are personalized medicine, individualized medicine, I've even heard precision medicine, but the bottom line by this, I mean healthcare tailored to the individual based on genomic information, and what this grand rounds is all about is the acquisition of that genomic information. If I were to use one slide that summarized what my passion is and what my institute's passion is, it would be reflected here, whereby we've completed the human genome project and we know the ultimate goal is to realize genomic medicine to improve healthcare based on genomic information. I will tell you that we try to define the path that's going to get us there and we know many of these steps, but we're certain there's lots of question marks, things we don't know, and surprises we haven't encountered. What I don't know is we were successful. At the genome project, we simply must be successful at realizing genomic medicine, and if we are successful, we'll truly fulfill the promise of why we did the human genome project and why we sequenced the human genome. So a one slide summary of what our institute is about and what I think the field of genomics is now about, in particular that relevant for health is that our central mission simply must be to establish that path to the realization of genomic medicine. Now there's a lot that has gone on since the end of the genome project and there's great progress that has been made on many fronts. I will share with you the fact that we have spent considerable efforts trying to understand how the human genome works. Efforts like the ENCODE project are trying to decipher the function of all the bases that are functionally important in the human genome. We've developed incredibly rich catalogs about genomic variation across human populations by efforts such as the HapMap project and now the 1000 Genomes project, whereby very, very deep catalogs of these variants are being collected and used for studies that in part can be applied to the study of human disease. And so number three, there's been great advances in understanding the genetic basis of human disease. And as many of you know, some of these things have now gone not only major advances in understanding rare genetic diseases, but perhaps thinking about the greater impact on the world, common genetic diseases, which are what more typically fill hospitals and clinics around the world, and gaining great traction in understanding how to go about to do studies, these known as genome-wide association studies, for example, that are giving us hundreds and hundreds of candidate regions of the genome that must and should be shown contain variants that are conferring risk for these complex genetic diseases. And so forth. I don't have time to summarize those advances. The focus of this grand rounds is around the technological advances that are facilitating all of those things that are accelerating our movement from the left side of the slide to the right side of the slide. Now, in 2003, when the Genome Project ended, our institute published a vision for the future of genomics research and said a very important battle cry, if you will. That battle cry was focused around in this metaphorical summary of how we conceptualize the different initiatives in genomics. A very key one was a cross-cutting element shown here called label technology development. The basic idea was we knew that technology advances in genomics would be pervasive in advancing the field and multiple levels, whether it was looking at biology, looking at health applications, or thinking about the societal consequences of genomics. And in particular, what we were quoted as saying at that time was technological leaps that seemed so far off as to be almost fictional, but which, if they could be achieved, would revolutionize biomedical research and clinical practice would be a very important thing to pursue. And interestingly, we gave as an example the ability to sequence DNA at costs that are lower by four to five orders of magnitude than the current cost, allowing a human genome to be sequenced for $1,000 or less. The battle cry was the $1,000 genome. And the idea was to take the price tag of generating a sequence of the human genome, which for the human genome project costs something on the order of a billion dollars. You can argue me a little around the edges, but if you round up, it's about a billion dollars. And could you, through the encouragement of grants and encouragement of companies to pursue new technology developments, shave enough zeros off this number that eventually you'd be able to sequence a human genome for something like $1,000? And $1,000 is a very reasonable test, a clinical assay, if you will, that would provide such rich information about an individual patient. And the field really embraced this battle cry. And so factories such as this that were responsible for generating the first human genome sequence by the human genome project were considered to be some things that we wanted to do away with and replacing it with some fancy schmancy technology shown in icon form here that would somehow revolutionize our approach for sequencing genomes through some nanopores or micro channels or whatever fancy technology one could envision. And the great news is that it has been wildly successful. And in fact, what I would tell you, I've been involved in genomics for 20-something years now, and I've seen lots of technological advances, some that have really sort of overwhelmed me by being impressed, but nothing has overwhelmed me and many other people in the field of genomics as the advances that have taken place in DNA sequencing over the past five years. Showing here again an icon form or some of the cartoon-like views of these new technologies, I'm not going to drill down into any of them specifically. I'm not even going to name names because I don't want to in any way to imply any favorite companies. If you want to read about these, practically every single month comes out another review article. And the reason why you need another review article practically every month is because the technologies just keep coming. And so anything you read a year ago was outdated already. And the fact is, these are not just theoretical instruments. These are real instruments, again, shown here are some of the ones that are already available. And in some ways, it's a confusing landscape because you have so many options. And each new technology that seems to come to market has improvements in some ways and some problems in some ways that we are grappling with. But overall, they are moving the ball down the field at an accelerated pace, at a time sort of a shocking pace in terms of trying to manage programs and predict where things are going to be a year from now. One of the analogies I give is that if you ask me right now, what's the best machine to buy or what's the best technology, it's a snapshot answer because the fact of the matter is there are more coming down the road. And the analogy I use is like sitting at an airport and looking out on a clear day. And you can just see multiple planes getting ready to come in for a landing one after the other. It is exactly the case in DNA sequencing, where yes, we have several planes on the ground, like the platforms, the machines I showed in the previous slide. But I can tell you, six months from now, there's another one that's going to land, a year from now, hopefully another one, two years from now, one that'll completely antiquate all the others and so forth. And so I'd like to show this slide, but boy, I feel like this is a good week to be showing the slide because any of you who got your issue of nature last week saw featured on the cover DNA sequencing, featured as sort of a new technology that was reported and specifically what they reported. Here's the primary paper and here is the news and views that was actually written about it was this thing called graphene nanopores. And again, I'm not going to drill into the technical details. I'll just quote the first paragraph because I like it because it gives us credit for something. And it just basically said the idea that DNA could be sequenced by running a strand through a tiny hole in nanopore and reading off bases by electrical detection was suggested 14 years ago. Since then, significant progress has been made towards this goal provoked by the US National Institutes of Health $1,000 genome challenge. So this has really catalyzed successfully sort of a new technological surge that's changing the face of genome sequencing. I will pause and point out to you that what I showed a couple of slides ago were different instruments that you could buy and we might consider buying some of them for the clinical center. Certainly some of those instruments are in laboratories here at the NIH. There are other models for how this is all going to play out. And for example, one company, this company, Complete Genomics, is a company that has a proprietary technology for sequencing genomes and they don't want to sell you instruments. They want to sequence genomes as a service. And I will just point out to you that I have a hard time understanding whether five or 10 years from now, whether when we sequence lots and lots of patients' genomes in the clinical center, which I guarantee we will be doing, whether we'll be doing that right here or maybe we'll just be outsourcing it. It may become such a commodity that just like any of you who do PCR, I don't think you synthesize your own oligonucleotides anymore. You outsource it. You get 20 years ago when I was a postdoc. That's what postdocs did. You synthesized your own oligonucleotides. Today, we sequence our own genomes. Five years from now, companies like this, other companies that will spring up, we might be outsourcing the sequencing of genomes. That would be a fine thing. Actually, that would be a good thing. So what does this look like? Why is it that businesses are building up around this and what's sort of been the overall impact of these new technologies? Fairies, Dana, I could give you. I think one way to simply summarize it is to show you what the cost for DNA sequencing has been and what its trend in real data. And our institute supports large sequencing centers on the outside and we actually get progress reports in all the time where we capture those costs. So we can show you real data. Before I share this slide, let me remind you of a concept. There's a concept known as Moore's Law. Moore's Law is in the computer industry the phenomenon that every 24 months or so, the horsepower associated with the processor power associated with computers tends to double. And that's phenomenal. And if you can keep up with Moore's Law, you are really rocking, if you will, in your technology development. So here's the graph. And depicted in green is actually Moore's Law as if you were simply doubling things every 24 months. Notice that the y-axis is logarithmic. So this is not a linear scale, it's a log scale. And this is actual data from our sequencing centers. Here you can see in the middle of the genome project, about the time the genome project ended. This is using old-fashioned Sanger-based sequencing, the kind of sequencing we did to sequence the genome in the first place. Right here is when our center started to use these next-generation sequencing technologies that I've been telling you about, and here's where they really deployed it. For a while, we were just keeping up with Moore's Law, and then with these new technologies I've been telling you about, we're blowing Moore's Law out of the water. And this is real data up until just a few months ago. So this is just phenomenal, and that our sequencing costs have dropped and have far exceeded Moore's Law. Now, some of you I know might be more graphical numbers oriented, and you sort of really get this slide. Some of you tend to be more pictorially oriented, so let me give you a real example to help you understand exactly what's going on here. So if you ask me the question, how many human genomes can you sequence for $10 million, okay? If we go back a decade, okay, a decade ago in the year 2000, this guy was in charge of the National Human Genome Research Institute. And if you gave him $10 million, he would deliver less than one human genome for you, okay? But if you fast forward one decade, 2010, I'm now in charge of the Institute. And look how many human genomes that you get with that same $10 million. So this is what we call the genomics industry, an upgrade, okay? Just so you understand, that's what we're talking about. So this is visually something to keep in mind, and I think it will sort of lay the groundwork for what you're about to hear. Now, how are we deploying this sequencing capacity more seriously? And what I will tell you is that if you sort of think about a timeline several years ago, in fact, I think when Lester and I gave Grand Rounds last time, the project that he described, which I'll be telling you more about, was mostly looking at hundreds of individual genes, candidate genes that we wanted to study. But these new technologies have moved us to the point where it is now extremely routine to sequence all of the exome, in other words, all of the coding regions, those parts of the genome that directly code for protein, and that's called whole exome sequencing. And then of course, then you could do a little bit more than whole exome. You could get all the exome, and you can get some of the regulatory elements or other known functional elements. And then of course, eventually you get the whole darn thing. Get a whole genome, all three billion bases, and actually you need to get six billion bases because of two haplotypes you need to deal with. And where are we right now on this? We're basically right there. Where is that we can readily get a whole genome sequence. You'll hear about that in the next talk, but we're now routinely and can readily afford whole genome sequencing, a whole exome sequencing, if you will. And what's remarkable is we've moved most of the way across this slide really in the course of just the past couple of years. So that's what we can do. We can sequence whole genomes, although they still cost maybe $10,000, $40,000. We can also sequence whole exomes, and those cost only a handful of thousands of dollars. And but you can see we are heading in route to getting down to $1,000 or so we think for a whole genome, maybe in the next few years. Those are being studied with these reduced costs to study different types of diseases. So for example, rare genetic diseases are making an upsurge, if you will, in terms of discovery. I think practically now you can open any months issue of nature genetics, such as the one last month I think it was. This is just an example of a disorder, rare disease called Kabuki syndrome, whereby the specific way they identified this gene was take patients with this disorder and simply go in and directly sequence their entire exome using these next generation sequencing technologies. And I know of at least a half dozen similar stories that are coming out in the literature over the next few months, and I think it's one of these things that every month we're gonna see, or even several times a month, see another rare genetic disease which was resistant to identification of the gene using conventional approaches, where now when you can sequence an entire exome, it just becomes something that's much more routine. The other major class of diseases where this is being really taken out for a test ride, of course, is in the area of cancer. And Cancer Genome Atlas, a joint effort between the National Cancer Institute and the Human Genome Research Institute is really taking out these next generation sequencing technologies and taking a very aggressive approach to sequence hundreds of tumors for multiple different kinds of cancer to develop comprehensive catalogs of genetic changes that are associated with different types of malignancy. And certainly what's gonna happen with time is that complex genetic diseases, whole genome sequencing, whole exome sequencing, will be used on a much larger scale to tackle the hard problem of actually identifying specific variants that are within Canada intervals and find out which ones are the ones uniquely causing or conferring risk for the specific disease. And there's other uses of next generation sequencing technology. I shouldn't imply it's all about disease. It's all about human health. Any of you who read the Washington Post last week probably were far more interested in what you read than what I'm telling you about now because yes, the chocolate, the cocoa plant which makes chocolate, it's genome was completely sequenced and the idea was could you improve the taste of chocolate by understanding the genetic basis of different features of chocolate? And so for you chocoholics out there, at least you should also recognize these same technologies are very relevant for your passion, if you will. Now what this is leading to getting back to humans, not chocolate, is an era of personal genomics of individuals genome sequences. This was featured on this issue of nature a year earlier this year or maybe it was the year before. There's even whole meetings that I attended even just a couple weeks ago. Whole meetings are just set up around the topic of this grand rounds and that is personal genomes because this is a huge field and has huge challenges as you'll hear about and recognition that this actually requires a significant amount of tensions to truly pursue. I'll also tell you the notion of personal genomes also brings other aspects to it including great interest of famous people for example to get their genome sequenced and get that reported either in the literature or by press release. This all started when Craig Vanter had a sequence his genome and published it and then Jim Watson had to get in the act, publish his genome. But then 2010 has been a rather interesting year of what I call celebrity genomics. So there was a nice paper here about sequencing some individuals from South Africa that actually had some very interesting findings. One of the individuals they sequenced was Desmond Tutu, so again, celebrity genomics. Now he's interesting and then a good friend of mine and collaborator Jim Lovsky who is a medical geneticist for his own medical reasons because he has Charcot-Marie-Tooth disease but couldn't, never did figure out what genetic mutation caused his disease and his family. Sequenced his genome and some siblings genomes and figured it out and that was an interesting story. But then Steve Quake shown here is one of the people that's developing some of these technologies and so he used his technology to sequence his genome and publish the findings from that in Lancet. And then it started to get a little weird, okay? This year I have to admit. So, you know, for example, all the people that were known and famous and had their genome sequenced, they were all guys, right? Well, what's with that? And so, Alomona, one of the manufacturer of instruments said this isn't good, we need to have a famous known woman and get her genome sequenced. So they sequenced Glenn Close for whatever reason and had a press release about it and then it was picked up by lots of news channels and media and publicized it. I still don't quite understand why they did it. Maybe they were looking for the fatal attractions gene or something, I'm not sure. And then it went from weird to just bizarre because several, maybe a month or two ago, a small company in St. Louis decided they were gonna sequence Ozzy Osbourne's genome and the reason for it is they wanted to figure out the genetic basis for why this guy was still alive with all the drug abuse. Okay, now the truth of the matter is I show these slides in part to be humorous and to be able to say that here is the head of the National Human Genome Research Institute, we had this $1,000 genome effort and wow, this is what it's delivering. In fact, this actually bothers me to be quite sincere. This is not the slide I wanna show. What I am looking forward to is maybe a year from now. When this is the slide I'll show. All the people who are known who have had their genome sequenced and if things go well six months later, this is the slide that I will show and six months after that, this is the slide that I will show. Because when I can show you slides like this with all these people whose genomes have been sequenced, it means we've moved beyond celebrity genomics and instead are now sequencing human genomes for the purpose of clinical research and clinical care at places like this and that's the reason why we're developing these technologies. So I've given you great optimism for what's going on in human genome sequencing. Let me be a little bit realistic here. It's not all as straightforward as I might portray. If you ask me what's the biggest bottleneck, not just in sequencing but all of genomics, actually I think probably in all of biomedical research, this is the metaphor I would give you. These sequencing machines are spewing out data far faster than we can collect it, far faster than we can analyze it and it's absolutely becoming an issue where data generation is not limiting by any means. Data analysis is what's limiting. We are in an era of big data that biomedical research has never faced before. Other scientific disciplines have faced as nature talked about in this article for biomedicine to have that kind of data. It's new to us and it's actually in some ways we need to think about how we're going to deal with it. I refer to this as the computational bottleneck. I admit it has many components associated with it. Some of us just share hardware and infrastructure for pushing huge amounts of data around storing them and having processor power to analyze them. We don't have adequate software tools yet. Many are in development for analyzing that kind of data, sifting through it, finding the relevant stuff, even the real stuff is and what the variants are. And we just don't have enough people. It's simply not enough people who are trained in biomedicine and also in computational biology and computer science and we need to be thinking about that pipeline as well. The fact is, even when we fix the computational bottleneck and even if I said, oh, we can analyze all the data, oh, we can come up with all sorts of lists of variants associated with individuals, the truth is these fancy new technologies that are reading out DNA sequence are just spewing sequence of human genomes and even if we have those beautiful lists of all the variants that exist for a given patient, it still will be puzzling to know which of them are phenotypically or medically relevant and which ones are not. I think very much this is a slide that I think Les will resonate with. This is probably what he feels like when he goes around on round sometime. Yes, we have this sequence, but we're not totally sure what all of it means yet. This is the grand challenge. Harold Varmus, who is known to all of you and now sits around the Institute Director's Table as head of NCI in a commemorative piece he wrote in New England Journal earlier this year celebrating the 10th anniversary of the human genome sequence. Wrote, physicians are still a long way from submitting their patients full genomes for sequencing, not because the price is high, but because the data are difficult to interpret. So this is sort of the grand challenge, if you will, is we have this technological capability and here we have before us this incredible opportunity to marry genomics and genome sequencing capabilities to all sorts of problems that really rest at the heart of why we have a clinical research center here and what the future is gonna be for genomic medicine. And I can tell you, as an Institute, we are in the middle of a strategic planning process. Actually, we're near the tail end of a strategic planning process. And in early next year, we'll be publishing a document that will really feature our vision for where genomic medicine is heading and I will tell you very much is focused around how are we gonna move from having this technological capability to sequence genomes, to actually being able to apply it in ways that is going to yield us an era of genomic medicine that is gonna not only be effective, but is going to be ones that we're gonna be able to improve the delivery of healthcare. That's a tall order. It's not anything that's gonna happen in the next five or 10 years completely, but it's one that we're certainly passionate about being able to be successful at. And hopefully I'll be able to come back to future grand rounds and tell you how we're making that progress down the road. I'll just leave you with a quote because I think it's very reflective of what everything I said basically is all about as a nice setup for less. This is individual, Geoffrey Carr wrote a very nice series of pieces celebrating the 10th anniversary of the genome sequence in The Economist. And he said a lot of interesting things. I just like this one in particular. He said the race was to sequence the human genome, all three billion genetic letters of it. A race not to the finish, but to the starting line. And in many ways that's where we are right now. We are at a starting line. But it's important to recognize the new race that is marked by that starting line is really a marathon. And what I would like to very much see is first of all recognizing it is a marathon. It's gonna require a tremendous amount of creative energies to realize how to go from where we are now to really changing how we practice medicine. But what I passionately wanna see is the Clinical Research Center playing a major role in that. I think that's exactly what we should be about and I think we're perfectly situated to do that. So with that I will stop and I'm gonna turn this over to Les that will save questions for the end. And what Les is gonna tell you about is very much a first step on this marathon. And the first step is a discussion and a description of the first clinical center patient who's had their entire genome sequenced. And that's sort of the major focus of his talk. So I'll turn this over to him. Thank you. Good afternoon. If we could flip over to my presentation that would be great. Okay, so what I wanna do is talk to you about some challenges and opportunities that we are facing with these new technologies. And this is the polite title. It could probably be renamed Excitement and Terror but I don't wanna scare you right off the bat here. So let's talk about what some of these data are and how we are approaching that. Okay, first I have my obligatory disclosures, none to declare. And then what I'm gonna tell you about here today. So I wanna provide a little bit of background on the clinical study that we're doing here at the clinical center from which these data have emanated, a little bit about the design and how we are approaching the sequencing of the patients in the cohort. And then go on to describe the first NIH clinical center patient from whom we have whole genome sequence results, the research data, example of how we're gonna use these data to answer research questions and some of the clinical implications of generating these data sets. And what we need to do to be responsible to the needs and expectations of our patients. As well, then I wanna go a little bit broader and give you a feeling of the spectrum of the kinds of results by looking across a number of exomes that we have done to give you a flavor for what kinds of things we can expect to find in the clinical annotations that result from that. So what I'd like to do to start these kinds of talks is to issue you a challenge. And that challenge is this. If you imagine yourself in a clinic room here or in a ward with a patient on whom you have whole genome sequence data, what biomedical question would you ask? And that's a radical question to pose because none of us are used to thinking that way about researching our patients. But this reality is here and you can begin to think about how you would approach problems in a novel way because you can begin to have access to these scales of data. Second question is, what patient, what question do you think that patient is going to ask? That's an important question too because these patients come to us for clinical research studies with their own expectations and hopes for their futures. And we need to be responsive and responsible in our answers to those questions. So how we're approaching this is to acknowledge the fact that we've been doing lots of genetic research in lots of different ways for many years now and this is just a different flavor and different dimension of those kinds of research. So we've been doing lots of single gene studies. Many people in this room have participated in this kind of work. Some individual genomes are being available now to us as well as cohort studies of anonymized subjects from whom we have variation data but without clinical data. But what we really want is complete genome interrogation or complete genome breadth. We want deep and robust clinical data and we want it on lots of people. We're greedy. We want to answer important biomedical questions and we need numbers in all of these dimensions to do that. Now the ideal study is out here in this genomic research space but we can't start there. We have to start at a smaller scale with studies that are designed to model these sorts of research questions to build the infrastructure we need to move toward this ideal goal and then scale things over time to take advantage of these technologies. So with ClinSeq what we're doing is targeting an initial phenotype of atherosclerosis, a common disorder with clear heritable components and we know that the heritability of that trait is complex, includes both rare and common variations that lead to that phenotype, generate sequence data, which I'll show you about and then do follow-up studies to answer these important research questions that we're trying to get at as to what's wrong with our patients and how we can treat them, how we can improve their lives and extend their life spans. As well what we're trying to do is interpret variations that we find in our patients that are clinically relevant and use those data appropriately and return them to the patients, not only to be responsive to their needs but to begin to develop approaches for how you deal with individual patients when you have large-scale data sets such as these and returning the results is a big challenge. So I won't go into the technology all that much because Eric has but the next generation sequencing can be applied directly to interrogation of entire genomes as well as you can select subsets of the genome that you wish to interrogate by using selection or capture methodologies that pick certain segments of the genome, select those and then put those into the machine what we call exome or exon sequencing. I'm gonna give you examples of both here today. So progress on our study when we first designed the study back in 2007 we actually weren't sure if very many people would even be remotely interested in enrolling in such a study where you offer to sequence the entire genome of patients and potentially return results. Turns out fortunately we were wrong and worrying about that and we've recruited nearly 825 patients as of last month to the study. The sequencing has gone faster than we expected back in 2007 and we have already completed 175 complete exome sequences of these patients and that is a sum total of about six billion base pairs of coding sequence. Remember coding sequence is just a small set of the genome, the parts of genes that encode proteins and we've also completed two whole genome shotgun sequences which in those two patients is more than six gigabytes of sequence of data, so huge amounts of data. So really like they say when you get on the airplane please make sure your seatbelts are securely fastened because we're going for a ride here. All right, I'm gonna tell you about our first patient. This man is a 47 year old male who came to us and in taking his history and his family history became clear that he had an unusual presentation and that is that he had a significant coronary artery disease as measured by his coronary calcium score by CT scan of a 292 which is way out of the realm of normal and when he did his family history he became clear that he had a family history of autosomal dominant coronary atherosclerosis and myocardial infarction, very unusual phenotype because in his family that phenotype is unassociated with hyperlipidemia. We all know hyperlipidemia is a common cause of atherosclerosis but this family has high pen entrance myocardial infarction without hyperlipidemia. So the family history looks like this. Here's the arrow for our pro band and you can see it comes from a Sib ship, a patient seven out of eight of whom have had either myocardial infarction or have significant atherosclerosis and a huge pedigree peppered with individuals with early onset myocardial infarction. So we felt that this patient was a perfect test case to feed into the system of high throughput sequencing as a whole genome sequence. Okay, so here we go. What kinds of numbers of data do such approaches generate? We generated in these next generation sequencing instruments 1.4 billion individual sequence reads which generated a gross overall total amount of sequence of 133 billion base pairs of sequence on one patient. Those sequences are then aligned to the reference genome and interpreted. And when you align those sequences to the genome it generates an overall coverage or depth or redundancy if you will, of about 55x coverage across the entire genome. When you compare that genome to the reference sequence what do you find? A lot of variation. In total this patient had more than three and a half million DNA base pair substitutions when compared to the reference sequence. Three and a half million variations. Two and a half million or so of those were heterozygous, 1.2 million were homozygous and he had a total of about 10,000 non-synonymous that is amino acid changing substitutions in the coding regions of his genes. In addition to substitutions he, like all of us also has differences in insertions and deletions in his genome. The small ones that he has that we can measure with the sequencing technologies include about a third of a million heterozygous small insertions or deletions which are about what we say about 10 base pairs or less and then about 80,000 homozygous changes. So enormous amounts of inter-individual variation can be generated and analyzed with this technology and the question is how do you make use of that? So we are using these data to try and identify the gene that is associated with their early onset myocardial infarction. We haven't completed that study yet but I'll give you a little progress update. We're using the familial samples from other members of the family that we have brought in to do some linkage and we are merging linkage data with whole genome sequence data to begin to isolate that gene. And if you look across the genome these are just the numbers that I gave you on the previous slide and then in his exome you can begin to say okay within let's just look at gene variations within coding regions of genes and that narrows things down a lot and you can do what we call filtering instead of just doing linkage we now do filtering approaches where we set certain criteria and push out variants that fail to meet those criteria. So we look at the total we look at those that are in coding regions we look at changes that are not synonymous that are not common variants in the database of single nucleotide polymorphisms and then we can further focus down on variants that are in the regions that are linked to his phenotype in his family to begin to narrow. So you can go from three and a half million variants to 130, 150 or so by using these various criteria and various genetics that you have to approach that. Okay, beyond the research needs what do we have to do with these data sets that are responsive to these clinical needs of the patient? It turns out you can do a lot with these data and we have obligations I believe to do a lot with these data to answer these needs and expectations. One thing we can do is you remember a few slides back I said the overall coverage of the genome was about 55x you can use the coverage of a genome as a proxy for measuring copy number variation or large-scale insertions and deletions by measuring that copy number variation. So you measure the ratio across the genome looking at windows, compare our patients or test genome to a reference genome and ask the question does our patient differ? When you do that across the entire genome again you get thousands of variations but you have to focus in on some. How do you do that? So in browsing through our patient's genome we can load our patient's genome coverage data into the UCSC browser and scan it visually and what you see this is the genome of a control sample here and this is the genome of our patient and what you see is a profile because the sequencing instruments have preferences for certain regions of the genome that are a little easier to sequence than others. So the easier ones give high peaks things that are difficult to sequence give valleys and you can see that the profile is pretty similar except for one spot on this genome right here. Our patient has a notch in his coverage and the average coverage of about 55X drops to about 25 or 26X in this region which is about one and a half million base pairs in length. So if you zoom in on the browser you can look at this more closely and you can see a very clear boundary here where your coverage drops and then it goes back up and it matches the profile of the other patient. So what's this about? Turns out this is a well-recognized copied number of variation in humans where deletions of this region cause a phenotype called hereditary liability to nerve and pressure palsies and duplications of this region are another cause of Charcot-Marie tooth disease, a form of the disease related to that that Eric mentioned that Jim Lupski had described in the paper about him. So what do you do with this? Turns out this is a clinically-recognized test so we took these data and we clear-validated the result which of course confirmed it and we brought the patient back to the NIH Clinical Center and counseled him on this test. And it turned out when we began to discuss the results the patient immediately recognized and I started my nice spiel about HNPP, interrupted me and just said, I have this. And turns out he'd had this for years, probably more than a decade, had been misdiagnosed as a spine abnormality and he'd actually been recommended to have surgery for that which thank goodness he had declined. So here's an example of a patient where we can take a genomic result, clinically interpret it, return it to a patient and use that result medically and counsel him and his family members on the inheritance of this copy number variation and the clinical implications. What's really weird about this though is this is not how we normally do medicine. We don't just troll through people's results and make diagnoses but in fact that's how this is done. So we diagnosed correctly that this patient had HNPP without a history that suggested that he had the disease, without the exam that you would do to rule that in or out or without a clinically indicated test to interrogate for that disease. You start with the genomics and you go to a diagnosis which is radical and a different way of thinking about how we do medicine. You can ask other questions about these genomes. Does this patient have other point mutations that are known to cause human genetic disease? How do you do that? Well you can intersect our patient's results with known lists of disease causing mutations. So this is the human gene mutation database and it includes 96, 97,000 mutations and 3,600 genes that are said to cause human genetic disease and ask which of those variants are present in our patient. When one does that bioinformatically we ended up with 64 candidate variations. We then had to curate or filter those. We excluded a number of them because literature review said that the database had incorrectly attributed causation to those variants. We excluded a number of them, 17 of them because it was in fact the case that the variation was present in the reference sequence and our patient had the wild type because again the reference sequence is another person. That person has our liabilities and susceptibility to disease. Turns out the reference sequence has the mutation. Our patient is normal. And five resulting variants were apparently causative of human genetic disease. So what were those? Here's the results of that. Four traits are well recognized in humans, autosomal recessive traits, our patient is a carrier for. A skin disorder, ectodermal dysplasia, pendrid syndrome, a disorder that causes deafness and thyroid difficulties actually positionally cloned by Eric Green and colleagues a few years back, galactosemia and a rare form of anemia. And so this patient is a carrier for a number of traits as all of us are. What's really odd is that we also ended up with the patient having a variation in a gene, RP2 that is said to cause x-linked recessive retinitis pigmentosa. The odd thing was is that when we looked up this disorder turns out the age of onset for the average affected person is nine years of age. And as I mentioned, our patient is much older than that and has perfectly normal vision. So what's going on with that? If you pull the primary literature on that, this is what you find. It turns out that x-linked retinitis pigmentosa can be caused by any of mutations in five genes, but only two of those genes have been positionally cloned. So what probably has happened in the literature, and this is an important thing to recognize, is that causation has been attributed to this variant incorrectly. And it's in the literature, it's in the databases, but it does not cause x-linked retinitis pigmentosa. So doing the return of clinical results is practical. So we can do these kinds of return results. It requires a lot of infrastructure, counseling and thought to go into it. The pre-return counseling, when we counsel the patients to determine if they wish to have the results returned, in this particular patient for these recessive traits, the patient was quite hesitant about that and expressed misgivings about the utility of knowing if he was a carrier for recessive disease. But in the end, he said, but I really wanna know anyway, just curiosity. Early adopter, technologically friendly patients, we really wanna know these data. The amount of variations that we find are close to predictions. Population geneticists have estimated long ago that the average human being is a recessive carrier for between five and eight severe disorders. Our patient had four, just right close into the prediction. The analysis that's required to do these sorts of things is complex and is not, it's neither straightforward nor easy and a fair amount of filtering, review and curation is required to make these analyses and make them work. And we have to be very careful of the fact that in our literature and in our databases, our false associations of variation with disease and these are potentially risky situations where we could falsely attribute risk in a disease, risk in a patient for disease and incorrectly counsel. So we'll have to watch for that very carefully. The beauty of that though is a cohort like this and results like this are an unbiased set of patients in whom we can objectively determine that such errors are present in the literature, correct them and improve our databases and our knowledge of variation. I next want to move briefly over to exomes. So instead of focusing deeply on a single patient, move to a large set of patients and ask what kinds of medically important information are present in the genomes when you interrogate an exome of a patient. So what we wanted to do is pick a set of genes that were obviously clinically important and relevant for patient care and landed on selected cancer susceptibility disorders because they're typically inherited in an autosomal dominant pattern and heterozygotes should be detectable and at some significant frequency should be identified by approaches like this. So we took a review article that reviewed a number of cancer susceptibility syndromes, 37 of which had identified causative genes and asked the question how many variants are present in those genes in 120 exomes of our patients. What we found was a total of 29 variants across 10 genes. And again, we filtered. We filtered those based on frequency reasoning that these rare cancer syndromes cannot be caused by common mutations in these genes and then did a manual literature and database review to further filter the remaining genes to identify relevant findings. What are some examples of what we found? So the most impressive example we found is a 49 year old man who has a known pathogenic allele in the breast and ovarian cancer susceptibility gene 2, BRCA2. It's a frameship mutation. This mutation has been reported 41 times in high penetrance breast and ovarian cancer families. When you look at this pedigree you see something really striking. Where are all the people with breast and ovarian cancer? There are very, very few in this pedigree. The family history indicates only that his mother may have had breast cancer and may have died early. He's not certain of the age. He isn't certain of the diagnosis. The other peculiar thing about this pedigree is it is very much biased toward males. There just aren't that many females in this family history who would be at risk for breast and ovarian cancer. So this is not a high risk breast and ovarian cancer pedigree and would never have ended up, would never have led this patient to end up in a cancer genetics clinic to be assessed for BRCA2 because the risk from the pedigree is just not high. Yet what we have shown is this patient in fact does have such a variant. And what's amazing about the variant is when you do the math it turns out that the relative risk of cancer for BRCA2 variants is actually higher for males than it is for females. It's only about a 10-fold relative risk for females because of the relatively high baseline frequency of breast and ovarian cancer. But because of the low population incidence of breast cancer in males, a 6% lifetime incidence, which is what patients with BRCA2 variants have, causes this man to have a 70-fold increased risk of having that and it is a preventable cancer in males. Screening and prophylaxis could be performed on affected relatives in the family, again in a family who would never have been diagnosed absent genome interrogation. There are other more challenging variations to think about. We found variations in the REC gene and the fumarate-hydrotase gene. Both genes cause rare forms of familial cancer. These are variants of uncertain clinical significance and it is more difficult to know how to proceed with our patients with these variants and this will require exploration and thought by us, our IRB, and our ethics colleagues to know how we should pursue these and work with our patients to appropriately return these results. So again, this is an unusual way to ascertain families for cancer predisposition. It's the lack of bias in ascertaining the families, i.e. not ascertaining families by the phenotype itself, is both a strength and a weakness. It's a strength because it can give us an unbiased mathematical estimate of the risk of having these traits, but it's a weakness in that we don't really know how to work with families who don't have the disease already in manifesting in high-penetrant situations to figure out how to manage them appropriately. So we will need to develop experience in how we ascertain and manage patients by this approach. So there are many clinically relevant variants that are going to be found in these genomes and in these exomes. They're of all different kinds. Copy number variations, cancer susceptibility, carrier status, et cetera, et cetera. The paradigm shift here is that, again, we are interrogating potentially all genes without a clinical indication to look at that gene. So it will require new ways to think about how we handle these results, how we return the results, and how we manage the patients medically going forward. Not only that, but we are going to have to improve and refine our approaches to analyzing these data, because as you can see, our manual curation is very time-consuming, and we are going to have to automate and increase the throughput of that, again, because of this flood of data that Eric talked about in his slide. My own feeling is that to return the responsible analysis and return of these results is essential to supporting our research mission of finding new causes of the relationship of genes to disease, and it will also be a technology that will enter clinical practice. So we, as researchers, need to be mindful of the data that the paradigms that we develop and the approaches that we use will eventually diffuse into clinical medicine unlike previous genetic technologies, which always were going to stay within the research realm. So I want to stop there with those examples of the results. There's a huge list of people who are helping to pull off the study. Thank you for your attention. I'd be happy to take some questions. Any questions? So the, right, the genes, both the HNPP phenotype that this patient has, and the four carrier traits that we have are, I believe, true, true and unrelated to that family's risk of myocardial infarction. Those are separate traits, and it's an example of the complexity of a genome. All of us have multiple traits segregating within our families for susceptibilities to different disorders. And untangling the separate and distinct influences of those genes for those traits is the challenge that we face. Can I ask that? So, to summarize the question, which is a totally fair one, is that we will get to a point where generating the data and analyzing the data to the point of, we probably will fix the computational bottleneck. There's not a technological bottleneck anymore, and that will leave us what I call the informational bottleneck, where we have to, and then the closer and closer it gets to patients, it's not just understanding it, but really even how to deal with it and interacting with the patients. And one of your questions, do we need more genetic counselors? And it was good that Les didn't answer because he has a conflict of interest because he's married to the head of the NHGRI Hopkins Genetic Counseling Training Program. So it's good that I'm answering this question. That should have been on the disclosure. It was probably should have been on the disclosure. I didn't anticipate genetic counseling questions. And then you very nicely said, do we need more genetic counselors? Do we need to change the way we treat, how we educate physicians and other health professionals, nurses? The answers to all your questions are yes, as we absolutely need to do all of that. We need to also think about, we probably need to do some research in this area to figure what is the best, who are the best people, what's the best way to train them? We also have to educate the public. I could tell you all the institutes interested in all of these things and some of the emphasis we're putting is not only just physician education, is the reality that some of these interactions are gonna have to come at other healthcare professionals, genetic counselors, nurse practitioners, physicians assistants and so forth. But we also are gonna need to raise the genomic literacy, if you will, of the general public and that feeds into a whole host of even harder issues around scientific literacy. So it's a real challenge. I mean it's something that I can tell you around our institute's strategic planning process, just comes up over and over and over again, recognizing that it's just gonna be yet the next grand challenge even once you get to the point of having lists of variants. I don't know, you can add to it, as long as you don't talk about genetic counseling. How's that? Actually, when I interview fellows for the genetics program here, I tell them that I think the future of their careers and their lifetimes is going to be focused on medical and biological information and that they need to be thinking of themselves as medical bio-information specialists because I think that's exactly what people are gonna have to be doing. So of the 64, that's the way you haven't thought about it. It's more than a little piece, it's a big piece and a database like that is a prodigious amount of effort to have 97,000 sequence variants in it and the relationship of those variants to phenotypes. And it's a mix of Mendelian variants and then GWAS-based associations and functional associations. And sometimes those wires, because it's a human-based process, those wires get crossed and that 43 is those wires getting crossed. And so number one is the databases need to be more robust. And once we can clean up those databases and tighten up the data and then we can apply semi-automated tools to then integrate and filter and do those things. And it's essential because the amount of curation per patient is ours here and so for our clinical cohort of 1,000 patients, we don't have the personnel to do thousands and thousands of hours of that sort of interrogation. Yes. So what I will tell you is all the big initiatives that I just briefly mentioned and whether it's 1,000 Genomes Project or HapMap Project or even many others are putting lots of this data. I mean, the great majority of certain stuff being supported by our institute are publicly available and increasingly certainly the Cancer Genome Atlas, for example, just a meeting last night I had with Harold Varmus and others, recognizing how important it is that all this data be put in accessible, publicly accessible or community accessible places so that people have different ideas on how to filter the data, how to analyze the data, how to figure out what's the important variance can bring their new tools to bear on that. So absolutely all that's being set up. Some of it's being set up because it's human subjects data between controlled access databases whereby you have to at least get appropriate approvals for accessing human sequence data but appropriate approvals can be gotten. Absolutely, that's the whole idea, especially behind these very large projects is to make that data available. The ClinSeq data are going to go both into the short read archive which is an open access database as well as going into DBGAP and so investigators will be able to interrogate the exomes on a thousand patients by whatever approaches they think they ought to try and that'll be a fantastic resource for people to work. So the Athroscoros is patient that you did the whole genome sequencing on. It looked to me in one of your tables like whole exome sequencing identified a larger number of variants within the linkage interval than the whole genome sequencing identified. I saw 137 versus 121 and I wondered if you could comment on is the whole genome missing more variants or is the whole exome picking up more false variants? Yeah, and that's a great question because when we say whole genome sequence it is not 100% of the genome and when we say whole exome sequence it is not 100% of the exome. Both methods interrogate between 85 and 90% of those denominators and there is a bit of complementarity in the two approaches so that what we actually did is we took an exome sequence from that patient and the whole genome sequence and asked the question how many variants can we find if we put those data together then it jumps to 92% and so whole needs to be in, actually the later slide you might have noticed I put all in quotes and that's exactly why. It's a great question. Thank you. Yes. Me like Ozzy Osbourne, that kind of. That's what I'm saying. All right, let me go. Well, yeah. So that I mean that's really the $64,000 health care systems question that needs to be asked for 10 years from now and I think the answer to that will be that a genome sequence will become a health care resource for a patient and that we will start to do it routinely when it becomes the case that the cost of a whole genome acquisition falls to where it's about equivalent to what the average person will consume in genetic testing resources over a predicted lifespan. So once you get to that point you might as well just acquire the whole data set and then your point is when and the question is as early as it's practical and then what I would envision for the model is that the data set would be a data set controlled by the patient and so when an appropriate clinical junctures in that patient's lifespan, a question arises that the patient with their physician or other health care provider can access that resource and answer a question using a computer instead of drawing blood and testing for that one thing and you just go in and use it for this and so some people have suggested in fact at birth this should be done and we should use it for a form of newborn screening could be done. You could use it for vaccination reactions, right? You could use it for sports physicals. You could use it for prenatal testing. You could use it for a commitment and sensitivity throughout the patient's life. Go in, use it, use it, use it and actually think in relation to your question about the health professionals. My feeling is that this is three times 10 of the base pairs does not need to be communicated to the patient. That's foolish, it's folly and in fact most of these data aren't really genetic testing data in the sense that you and I think of a genetic test but it's a resource and so when a patient needs to go on Coumadin don't even talk about a genetic test. The physician would use the data to adjust the dose for the patient and who needs to even talk about it? Why should anyone care that it's a genetic test? They're just improving the care of the patient. So in that way a genome can actually dissolve into medicine and be used by a lot of clinicians for different reasons, for different purposes at different times in the patient's life and then it becomes a fantastic resource. I guess I'm not sure I know what you mean by meta... I thought you were going one direction and then you turned up. Metagenomics, meaning? Oh, epigenomics. Okay, don't confuse it with metagenomics, it's slightly different. So Lass and I very gracefully only talked about the primary sequence of human genomes and by no means would we want to imply that that's the only genomic information out there. And the idea of we know that our genomes are decorated with methyl groups and packaged with histone proteins and so forth and all these marks are left on our genome in ways that we're just barely scratching the surface and what will be its relevance for diagnostics. I can't help but point to next week's Grand Rounds where you'll sort of see another major avenue of genomics which is much closer than even the routine sequencing of genomes even before patients are symptomatic. That's in the arena of cancer. And in cancer, of course, lots of ideas and lots of data to suggest that maybe epigenomic marks are very important in the derangement leading to cancer. And absolutely, the only thing I will say other than we didn't really talk about it much but acknowledge it's there and it's huge and it'll be complicated is that the technologies that I described, these next generation sequencing technologies have completely accelerated, I mean unbelievably accelerated our ability to study epigenomic phenomenon because basically those same technologies are used as assays for figuring out exactly where the epigenomic marks sit. And so lots happening in that arena, we just didn't happen to talk about it. So when we've solved the primary sequence issues and we understand it completely, then we'll move into epigenomics and then we'll get into metagenomics and microbiome but that's a whole other Grand Rounds. Okay. Well, thank you. Those were great presentations, great questions.