 We could get your last bits of coffee and everybody come sit down, please. That'd be Maggie McGuire and Ray McDougal. You weren't even hearing me. Welcome to the NHGRI Science Reporters' Workshop. We have a lot of stuff to cover today, so we're going to try to keep things moving along. Dr. Green will be the emcee, but I'll be behind him pushing this as much as I can. But this is really about getting your questions answered and having conversations. So the presentations will be relatively short. And then we really are looking forward to a spirited discussion. A couple of housekeeping things. In case you didn't see them, the bathrooms are all the way back where the escalators come down. So we don't have enough breaks probably built in, so if you need to go step out, please just go right ahead and do so. And we are recording today's event. It will all be going up on the website as expeditiously as we can make it go. And so when you ask questions, please go to a microphone so that we can record your question also. And I think those are the only logistical issues that I need to bring up. So I would like to get the day started by introducing Eric Green. Eric is an MD PhDs, the director of NHGRI since December. What day is it, Eric? Six months in a week. He's having as much fun as he possibly can stand. So we'll just roll on from here. This is Dr. Green. Thank you, Larry, and let me give my own welcome to all of you. And thanks for coming to the Science Reporters Workshop. NHGRI organized this day-long workshop to acknowledge the 10th anniversary of the completion of a draft sequence of the Human Genome by the Human Genome Project. That announcement was certainly a historically important milestone. And it was made in a somewhat artificial date of June 26, 2000. So that was the day that my predecessor and now director of the National Institutes of Health, Francis Collins, who you'll hear from shortly, stood at the White House with President Bill Clinton and Craig Venter, then president of Solera Genomics. Of course, that's a private company that at the time was also sequencing the Human Genome. President Clinton at that time made some remarkably kind remarks about the Human Genome Project. He was joined, and those remarks were echoed by British Prime Minister Tony Blair. His role, of course, related to the Sanger Center's involvement in the Human Genome Project, and Prime Minister Blair joined by video conference to that historic event. Kudos were given to both the public Human Genome Project and also the contributions of Solera Genomics. And with that, the much hype raised between the two efforts was declared a draw. Shortly after that, Solera Genomics essentially went out of the business of sequencing genomes, and Craig Venter later left the company to pursue other things. But the Human Genome Project continued its work for another three years, doing the tedious and difficult task of fixing the imperfections in the human draft sequence, so as to improve its completeness and enhance its accuracy. Those efforts continued until the goals of the project were reached, as defined by the program participants or the project participants. And so finally, in April of 2003, the very high-quality reference Human Genome Sequence was declared complete, and I'm sure as many of you have heard many times before, it was under budget and ahead of schedule. And I had to say that because Francis is here, and he makes me say that every time I bring up the topic. So that's actually all I want to say about the history of the Human Genome Project, although I'm sure Dr. Collins will have more to say in his remarks. Instead of focusing on the past, this workshop really looks to aim to talk about, discuss, and critically think about the future. First, by focusing on what we've learned about the Human Genome in the first decade of having in hand the draft sequence, but more importantly, where we're going with that new knowledge in the future. The day itself is progressively organized. We start with the basic refresher of what the Human Genome is and why we wanted to sequence it in the first place. And then we delve into the kinds of basic research that needed to be done in order to understand how the Human Genome works. Simply having the order of the three billion letters in the Human Genome is just not enough. We had to develop tools and techniques to interpret its contents, and that really remains a work in progress. As you will hear, there is actually still much debate about what is and is not functionally important among those three billion letters. Fortunately, we have evolution on our side. And one of the things we've done is comparing the sequences of genomes from animals that separated from one another millions of years ago, and figuring out what nature thought was important because those important bits tend to be preserved by nature. And so you'll hear about how we're using evolution to understand not only genes, but also parts of the genome that regulate complex activities in a living cell. As the day progresses, you'll hear increasing discussion about how basic genomics research is being directed at improving human health. For example, you hear about the Human Microbiome Project, an effort that is using cutting-edge genomics technologies to create a catalog of all the microorganisms living in and on our body with the goal of understanding how those microbes contribute to human health and human disease. One of the early mysteries about the first reference human genome sequence was the realization that we're all pretty much the same, except for a very small fraction of the letters in our genome that are different among us. Many studies have already shown that individuals with different risks for different diseases are very relevant to study and just consider family history as an example. The question has always been, why do certain diseases run in families, and how do I identify the genetic basis for those patterns? The answer must be in the sequence variation among individuals. So you'll hear a lot today about a range of efforts to understand what role genetic variation and other factors, such as epigenetics, play in different people's disease risks. To close out the basic research component of the day, you'll hear about how new and powerful DNA sequencing technologies have arrived on the scene and really are changing the face of genomics and, in fact, changing the face of other areas of biomedical research as well. These instruments generate vast quantities of data that are overwhelming current computer systems and computer scientists alike. We will also hear from someone who directs a modern-day genome sequencing center and how these groups tackle the challenges of deploying these incredibly powerful and new technologies. Our lunchtime speaker is Sharon Terry, the executive director of the Genetic Alliance, an umbrella organization of patient advocacy groups. Sharon plays a key role in helping us remember what we do is not just about academic research, but it also matters for those who suffer from more than 6,000 rare genetic diseases that cost suffering to millions worldwide. Her talk, in many ways, will set the stage for an afternoon of discussing applications of genomics for clinical research and, eventually, clinical care. The Human Genome Project was launched to improve human health, and NHGRI plays a critical role in leading the transition from basic genomics research to medical applications, starting with the large population studies needed to define the role of heredity in human diseases. We will also hear from experts about large population studies, especially those involving traditionally underserved groups where health disparities are common and issues of race complicate the very way we understand cause and effects. You will also learn about some prototypic clinical genomic studies. For example, the Cancer Genome Atlas, or TCGA, is a signature project co-led by NHGRI and the National Cancer Institute that is cataloging the many genetic changes associated with different types of cancer. You'll also hear about ClinSeq, a demonstration project in NHGRI's intramural research program that aims to study how to utilize genome sequence data for clinical research studies, and this is starting to open our eyes about what genomic medicine may be all about. In fact, you'll hear from the first individual in the ClinSeq study and in any study at the NIH Clinical Research Center to actually have his whole genome sequence. He will describe his experience and a surprise finding in his own genome. NIH has a long tradition of finding treatments and cures for diseases, and so in the middle of the afternoon you will learn from the NHGRI clinical director, Dr. Bill Gall, about his achievement in identifying a new drug for treating a rare genetic disease. That drug is now an expedited review at the FDA. The photos showing the consequence of the disease and the results of this new drug treatment are just simply stunning, but I'll let Bill tell you about that. At the end of the day, we will explore what people want to know about their own genetic information and how they use that information to make health decisions. During this last panel, we'll actually have some real news to report. The annals of behavioral medicine has kindly lifted the embargo on its upcoming issue so that the results from a public survey can be presented by one of the co-authors. And besides thinking about how genetic information will or should be used, we'll also consider how it should be protected. I will wrap up the workshop with some of my own thoughts about the future of genomics, presenting a brief overview of a strategic planning process that NHGRI has been carrying out for the past couple of years. So it's going to be a full day, but I'm sure that all of you are going to find it interesting and hopefully thought-provoking. So now I've been talking about our presenters and what they will be saying to you, but we recognize that this is a workshop filled with reporters, and the reporters usually have lots of questions. So the talks, as Larry mentioned, will be short with a minimum number of slides, leaving us plenty of time for questions and hopefully our answers. I really do hope that today ends up being more of a discussion than a series of lectures. We have nearly two dozen outstanding genome scientists and leaders at this workshop, and I urge you to take advantage of their availability. We are here to answer your questions. So thank you for your attention, and let's get on with the proceedings. So it's my pleasure to introduce to you someone that I expect all of you know, my predecessor and my good friend Francis Collins. I'll actually not go through Francis's long history of accomplishments. There is for Francis as well as for all the other speakers a brief biography in your background books, and also you can find it on Francis's case on NIH.gov and for the other speakers some information on genome.gov. For 15 of the last 16 years Francis has actually been my boss in one form or another. The one year when he wasn't that my boss he was shall we say between jobs. He recruited me to NHGRI in 1994. He made me director of the intramural program in 2002 and then appointed me director of NHGRI in 2009. During these many years together I've learned a tremendous amount about how to be a researcher, how to be a leader, how to be a communicator from Francis. It really is those skills that has made Francis such an effective leader as he demonstrated in his leadership of the Human Genome Project. So he will kick off today's event by sharing with you some of his experience and provide a historical perspective about what I believe historians will conclude was a single most important project in the history of biomedical research. Francis. Well thanks Eric for the kind introduction. Good morning to all of you. I'm really delighted to see the room full of individuals who are involved in telling the story about the genome and a story that goes on year after year coming up with new and exciting observations about how our own DNA instruction book plays a role in health and disease. And I think there is a law of technology once cited that says that a truly transformational technology will always have its immediate consequences overestimated and its long-term consequences underestimated. I think that's turning out to be true for what we are learning from the human genome. And you'll be hearing about that during the course of the entire day. It's great for me to have a chance to be back with my genome homies now that I've moved on to this building called Building One on the NIH campus and have now for the last nine months I had the opportunity of serving as the director of the NIH. But certainly as I look across what's going on in all of the 27 institutes and centers it's quite clear that genomics is a central topic of innovative excitement whether you're talking about heart, lung and blood disease or whether you're talking about cancer or diabetes the institutes are utilizing the tools that have sprung out of the human genome project to try to arrive at new conclusions about what are the causes of disease and how we might do a better job of preventing and diagnosing and treating. So it's appropriate I think to spend this day and appreciate that all of you busy people have come here to do so to try to see where have we gotten to and where are we going next because I do believe this science is driving a lot of the excitement right now in biomedical research and that's likely to continue for some time. The program you've had laid out for you just now by Eric and as in the agenda is actually quite broad and will touch on many different areas. My task this morning I think was to sort of reflect a bit upon what's happened over the course of the last few years perhaps touch on a few of the major milestones and then speculate a tad about where this might all go and then I hope we can have some interaction as Eric said that was a big part of today's intention is to make this an interactive event. Well that's genetics in case you were wondering we're pretty smart and NIH we have figured out how to do genetics it's just thought like this you put that together with that and you get that just a little bit of a silly point but after all what we're interested in is human genetics we're interested not just in phenotypes that may be reflected in outward appearances but we're particularly interested in phenotypes as they relate to disease and everything you're going to hear about today is really driven by that desire for me as a physician medical geneticist the appeal of the genome project was the chance to actually try to come up with answers for the thousands of conditions that we currently know sort of how to diagnose although oftentimes we're not quite sure and often don't really quite know what the molecular basis is or what to do about them so we are trying to go beyond the external to something more fundamental in terms of our understanding of disease and using the genome as the means to do that and here we are at ten years as was put forward in this issue of nature and many press articles coming out right now in terms of the reflections upon what's happened since June of 2000 but it's also appropriate to note it's not only ten years since that draft genome it's twenty years this year since the human genome project was started this timeline which has a many different features on it and I guess for a pointer we're just using the mouse okay well it was launched in the U.S. after a very important National Academy of Sciences report really got the momentum going after there had been a lot of skepticism a lot of frank scientific resistance but ultimately leading to the initiation of this effort with the beginning of the human genome project in October of 1990 in the U.S. this timeline covers a lot of things that happened in the early days and of course there was no real sequencing of human DNA in a serious way as part of the genome project for the first six years or so pilot projects for human genome sequencing begin in 1996 the first few years of this were to try to build the technology capabilities knowing that we were nowhere near ready to tackle a three billion base pair genome in 1990 and also to test those out on model organisms such as E. coli and yeast and ultimately roundworms and fruit flies and a whole host of sort of planning efforts building maps, physical maps, genetic maps and so it went until an important step I just wanted to touch upon which was in 1996 just as some pilot projects for human sequencing were beginning the international community carrying out human DNA sequencing met in Bermuda this was an effort to try to come up with some shared ideas about data quality but a very important issue that was raised with data release and this was the point at which the groups carrying out the sequencing tried to address the question about what's the right thing to do as far as access to this information because clearly it was going to take many years to get the entire human genome sequence clearly it would not be possible to publish papers every four or five months along the way it would await at least a draft version of the sequence before that could happen and yet there were many scientists around the world who were hungry to have access to the information so what should we do? and that group as recorded here in a photograph of the whiteboard upon which various versions of the policy were written that group agreed to basically do something fairly radical here and I do think it's important to point to this because in many ways this is one of the most significant legacies of the human genome project is this new attitude towards data release aimed to have all sequence freely available and in the public domain for both research and development in order to maximize its benefit to society that was the agreement and basically the countries involved all agreed to that none of them as far as I know checked with their ministers of health it was the right thing to do certainly for myself being very much part of that meeting I was delighted to see the way in which the scientific leaders came together around this recognizing that in so doing they were diminishing their own abilities to have first crack at trying to apply the data they were producing to interesting medical problems they were basically saying it's up to the whole world to do this and we are data producers but we shouldn't get any sort of special treatment as far as the use that has played out now over the course of many years to the point where this has become the ethic of genomics research if you're producing fundamental data that is essentially for the community what we call a community resource project then the expectation is immediate data release and that has been lived up to and has extended well beyond DNA sequence information to other types of information as well and I think has been a profoundly positive development in terms of the progress that has been possible in science since that time well going on beyond 96 then sequencing really got underway in a significant way the international community rallied around this ultimately 20 centers in six countries worked together to try to generate initially a draft and ultimately a complete sequence of the human genome putting all the data in the public domain every 24 hours that was quite an interesting experience for me as the field marshal or project manager or whatever you want to call it responsible for trying to make sure the project stayed on track of course I had some authority over the centers that were being supported by the national institutes of health but no financial authority over what was going on in the other countries so it really required everybody to have a general sense of agreement on the principles and the willingness to implement them to make this work and there were all kinds of interesting dynamics that went on there but I think it's to the credit of an amazing group of leaders that this was done and done beautifully and effectively and that by June of 2000 there was a draft version of the human genome sequence completed and that means we got to go to the White House and stand with the president with this screen over here connected to Tony Blair in England and have press conferences just afterwards in the White House press office and here standing in the front hallway of the White House Craig and I realizing that the Time Magazine cover had just arrived on that same morning so that was an exciting day to be sure and then we went to the Hilton and had a big press conference with, as you can see, lots of members of the press attending to hear what was going to be said just noticed Al Rebson and Ruth Kirschstein sitting right there in the front row Ruth Kirschstein, former acting director of NIH who just died last fall and who's much remembered right now for all of her contributions to NIH and to our training efforts. This is Ari Petrino of the DOA and this lineup here of remarkable leaders all looking a bit younger than they do now at the time that this was announced. Albert Branscombe, Richard Gibbs, Eric Lander, Bob Waterston, Greg Shuler. So that was a great day, although it was, as Eric says, a bit arbitrary because we had at that point sequenced roughly 90% of the genome. It was a bit of a gapped assembly to say the least. Lots of work still remained to be done. This was, of course, the moment at which the public effort and the private effort led by Venter at Sallera basically announced together the arrival of this draft. And as Eric mentioned, with it now in the public domain, Sallera essentially ceased work on the project. In the public effort, however, we had to go on because it was not sufficient to have a draft of something as important as the human genome. You wanted to have the full, correct, highly accurate version. By the way, the publications then describing the draft came out in February 2001 and then the finished version in 2003, accompanied with all sorts of wonderful diagrams. And again, I just want to give credit to those who did this amazing work here in a picture in about 19, well, probably about 2002, I'm guessing, at Cold Spring Harbor, gathered to assemble the last bits of the completed genome and representing here about 2,500 people who worked together on this project when it was all put together. So I had the pleasure of being a Woodrow Wilson kind of person here, not only using all the brains I had, but I had a lot to borrow from all of this. And that was an amazing experience. But okay, so you've got that first genome. What do you do with it? Obviously, we were just beginning readers when it came to trying to make sense out of this 3 billion base pair instruction book, and we had a lot to do to try to understand it. And one of the first things we wanted to do was to compare it to other genomes and with sequencing beginning to get more reasonable. We had the chance to do so. Inside your book there, there is a reprint of the paper seven years ago that was published at the time of completing the human genome sequence that laid out an agenda for where we might want to go next. And it's kind of interesting to look back at that, because I think actually did a fairly good job of laying out 15 grand challenges that the genomics community might contemplate taking on next, many of which have actually been achieved, all of those building on this foundation of the human genome project, but then applying genomics to biology, to health, and to society. And during the course of today, I'm sure there will be much reflection about what's been going on in these three stories of this hypothetical, metaphorical building. So comparative genomics certainly got a lot of attention and still does, and you'll be hearing more about that today. The mouse genome got sequenced fairly shortly after the human, the rat, the dog, the chimpanzee, the macaque, the honeybee, the sea urchin, the platypus, good heavens, all of these making the cover of major journals, but that's just a subset of the total genomes that have now had their sequences revealed, and you'll certainly be, I think, impressed later on today to hear how rapidly that list has grown and how ambitious the efforts are now to carry this on to look at a great deal of the planet and animal kingdom. And that has told us a lot by looking into Evolution's lab notebook about which parts of the human genome have been most conserved over the course of time and therefore most likely to have important functions. Certainly, the progress in sequencing has enabled that kind of comparative genomics and now enables applications to human genomics in ways that we could hardly have imagined 10 years ago. This curve, which I would bet somebody else will show later on today as well because it's often referred to as a bit of documentation about just how dramatically technology has advanced. This shows you what's happened in the red line here with the cost of sequencing a million high-quality base pairs. Q20 is a measure of quality. And in 1999, that was roughly $20,000 to do a million base pairs. That has dropped as of last year to a dollar and it continues to drop dramatically. And compare that with Moore's law for computers, which we thought was going to be the gold standard for progress in almost any kind of technology. Actually, Moore's law is not moving as quickly for computers as DNA sequencing is. And there is no end in sight. We're not about to hit some sort of fundamental barrier. No laws of physics have to be violated for this to continue and it's continuing in a dramatic way. And the cost of sequencing has now fallen according to Illumina's announcement last week to $9,500 for one genome. That's interesting given that the first genome, depending on how you account for it, costs about $400 million. And the expectation is that we will reach the $1,000 genome certainly in the next four to five years. That means that there's a lot of data coming out and that means that the individuals who have to deal with that data find themselves in the position of this individual. That is the boy drinking from the fire hose because it's one thing to generate the data, it's another to try to understand what it means. And you'll be hearing quite a bit about the struggles, which are good struggles, but they are struggles just the same, of the computational components of our community to try to make the maximum benefit out of all the data that's now possible to produce. And it will only get more so as we reach the point of having complete genome sequences on tens of thousands of individuals, which is not far away. And of course, the applications to medicine are expanding and one area that is particularly exciting right now is cancer because cancer is a disease of the genome. And this comment here, if we wish to learn more about cancer, we must now concentrate on the cellular genome, was one of the first public calls for a human genome project. Renato Dolbeco, Nobel Laureate, 1986 in an editorial in Science made the case for the genome project on the basis of what it would do for cancer. And that is now coming true. You will hear today about the Cancer Genome Atlas, a project which involves many laboratories working together to try to, in a very comprehensive way, find out all the ways that a good cell can go bad, a normal cell can become malignant, and allowing us to move beyond the list of genes that we knew were involved in cancer to the entire genome and find out what's there with the potential, a potential that's now being realized to take a disease and really understand at the most detailed molecular level for the 20 most common cancers what is driving this and how could you use that diagnostically and therapeutically. We also, of course, are making efforts to try to understand how does the genome function. And you will hear a bit today about that from the project called ENCODE, the Encyclopedia of DNA Elements, which has been working now for several years and continues to expand its reach to try to identify the parts list of the genome, all the functional elements, which goes well beyond just the coding parts that code for protein. And both for the human and for model organisms, there's a MOD ENCODE project, which is making substantial progress in identifying what those components are. And they must be important, because you have the same genome inside your liver cell or your brain cell or your muscle cell, and yet it is obviously functioning differently in terms of which genes are on or off. And that's all part of epigenetics, epigenomics, and the ENCODE project and some other projects you'll probably hear about later today are focusing on trying to learn what those rules are. How is the genome marked up by various proteins to tell genes whether they should be on or off? And how does that get affected by environmental exposures and how can it contribute to the presence of disease? Very exciting area indeed. MICE certainly have for a very long time been a strong component of our efforts to understand the way in which genetic changes can result in different phenotypes and in disease. And after many years where mouse genomics was pretty much the domain of individual investigators, the time arrived where we really felt it was appropriate to begin to do this in a more systematic, organized fashion. The knockout mouse project has been underway now for three and a half years and is basically devoted to the idea together with European colleagues that it would be good to develop in stem cells a knockout of every single one of the mouse protein-coding genes in order to do this in a systematic, organized, efficient way using the most cutting-edge technology to do those knockouts by homologous recombination and then making those embryonic stem cells available to investigators who are interested in using them for all kinds of purposes. And very recently NIH has now made a commitment to go beyond just the creation of those embryonic stem cells but to actually begin systematically to determine the phenotypes of each one of these mice, again with our European colleagues as collaborators. So the mouse will continue to be a real driving force in our efforts to connect genotypes and phenotypes. Another project which you probably haven't heard much about but which I think has been actually quite beneficial to the research community is something called the MGC, the mammalian gene collection. Here is the problem. If you are studying a particular gene and you want to understand its function, you're most interested, oftentimes, in having an actual copy of that gene that includes the entire coding region, the part that codes for protein. That would be a cDNA. And yet postdocs oftentimes labor for months trying to obtain that full-length cDNA and it's not a particularly good use of their time. So NHGRI, again with support from many of the other NIH institutes, assembled an effort to do this systematically once and for all and basically to put together a complete set of the full-length cDNAs for human and for mouse. Those were derived in various ways. The last set of them were actually derived by synthesis because one can now do that in a way that was not so possible a few years ago and fill in the gaps that were present up until then. So the full-length cDNA project has turned out to provide resources to individuals who are interested in using these resources that just weren't available before. And what about finding the genetic contributions to disease? This diagram, published now eight years ago, was sort of a good news and bad news story. So plotted here as of 2002 were the number of human Mendelian conditions that is dominant, recessive, or X-length, highly heritable single gene disorders that had had their molecular basis discovered. And you'll notice there wasn't very much going on until the late 1980s and then it really zoomed upward and that is a consequence of the human genome project providing maps and technologies to enable you to go and find the cause of those single gene disorders. So we could document that one with a smiley face. That was good and that has continued to go way up since that time. But there's another part of this diagram that's not pretty at all, human complex traits. So that would be diabetes or Alzheimer's disease or hypertension. These authors, now you've got to read this axis, at this point only believed seven variants had been conclusively shown to play a role in those common disorders. And yet those are the disorders that fill up our hospitals and clinics. So it's clear we weren't getting very far with that part of the effort and something else had to be done. Well, what came along at that point was some technology and another organized project, the HAPMAP project, which was a natural follow-on to the human genome project. Focused now on that roughly 0.4% of the genome where individuals differ and trying to understand how that variation is organized into neighborhoods across chromosomes because that could be quite useful in identifying where the common variations are that are associated with common disease that had been previously elusive. This effort, which I also had the privilege of leading, involved now six countries, not quite the same countries, and a host of dedicated scientists who put their shoulders commonly to this wheel, produced all the data in three years, all the way just as we had with the genome sequence and created the first comprehensive map of genetic variation across all of the human chromosomes. That resource, together with dramatic improvements in genotyping technology that would allow you to take a DNA sample and test it for a million places in the genome where there were known to be common variations and to do so at a very affordable cost. Again, this is a log scale. It's a cost coming down rapidly between 2001-2005 and has continued to come down a bit since then, although not quite as dramatically. The results of that will probably also get shown by more than one person because it's such fun to produce these diagrams, but I'll be the first. In 2005, the first success was reported of taking what we knew about genetic variation from the half map and taking those technologies and scanning across the whole genome and trying to find a genetic variation that was a significant contributor to the risk of a common disease. And this was the first time it worked because half map was just becoming available. So here are the human chromosomes, 1 through 22, X and Y, and the success here was finding a variant on chromosome 1 in a gene called complement factor H that turned out to be a major predictor of the common late onset disease called age-related macular degeneration. Macular degeneration, almost the most common cause of blindness in the elderly, not a disease that I think a lot of people expected was going to have huge genetic contributions because it comes on so late, and yet here it was. And pointed to a gene, complement factor H, that was on nobody's candidate gene list, again emphasizing the power of being able to scan the whole genome and not limit yourself to your best hunches. While that was a success that wakened everybody up, wow, this is going to work. So many groups then began doing this same kind of work, looking basically at thousands of individuals with a particular disease, thousands of individuals who were otherwise well matched but did not have that disease, scanning across the genome, trying to find variations that seemed overrepresented in the affected individuals. That's called a genome-wide association study. So in 2006, three or four more of these appeared. Each one of these colors actually represents a different disease, but I'm not showing you the key because it's going to get much too busy. So let's see how busy it gets. In 2007, things really started to pick up, so we'll take it three months at a time, first quarter, second quarter, third quarter. This is the end of 2007. Now dozens of these variants turning up that are associated with Crohn's disease, with a variety of heart conditions, with diabetes, type 1 and type 2. On into 2008, more of these number of bullets appearing. Here we are at the end of 2008. Now we're up to well over 50 of these. 2009, the numbers continued to grow. And here we are as of April 1st of this year. Now hundreds of these that have been validated are clearly true, real associations with common disease. Each one of these pointing towards a gene or a pathway that seems to be involved in contributing to risk. Now most of these are rather weak in their contributions. If you had one of those variants, it's likelihood of increasing your risk of that particular disease would only go up by perhaps 10, 20, 30 percent. And it's still unclear where the rest of the heritability is hiding. In fact, we have now coined a new term here which is hard to see, the dark matter of the genome. Because for most diseases like diabetes or hypertension, the heritability that's been discovered by this genome-wide association effort only represents in the neighborhood of 10 or 15 percent of what must be there. So where is the rest? Well again, the genome-wide association approach really only works if you're looking for a common variant, something that's present in 5 or 10 percent of the population. If it's rarer than that, you don't have enough power to find it. And this is where the ability now to go beyond that association study to sequencing is coming along just at the right moment. And the expectation is that a lot of the dark matter of the genome will be less common variations that actually have larger effects than what happened to have one of those. But it has of course not stopped efforts because we do have all of these common variants that are well-validated companies, such as the three that you see here have been marketing directly to consumers the opportunity to find out individual risks for disease. Some of us have tried this out to see how these companies would in fact handle the request and what kind of data you can get. On this, I actually decided not to use my own name because I thought they might maybe give me some sort of different treatment if they knew who had just put in their DNA to the system. So for all three of these companies, I sent in my DNA and I made up a little bit of medical history and asked them to tell me what was the finding. And what was interesting about this, and this is a very hot topic right now because there's a big question mark that there should be more oversight. I would say all three of these companies seem to have highly accurate lab methods. That is, there were no differences in a very long list of DNA tests that they did in terms of who said I had a T and who said I had a C in a particular position. What you would call the analytical validity, that's the term of art here, for the methods used by these companies in general for DNA is actually quite good. These days, if you're using a decent platform, you shouldn't make very many mistakes about getting the DNA test result correct. The differences were in the interpretation and you can kind of see why that would be. For my DNA, all three companies agreed that I had an increased risk of diabetes, maybe something like 50% higher than the average person and that was actually a bit of an eye-opener and caused me to change my health behaviors a bit. But they were all over the place in terms of other predictions like prostate cancer where one of them said I was lower than average, one of them said I was average and the other said I was above average. So what was that about? Well, it was simply a matter of which particular publications had that company depended upon to make their prediction and with the field changing so rapidly, you could kind of see how they could potentially come up with different answers. Although that's obviously unsettling if you're thinking that these are, in fact, recommendations that people are going to pay attention to. This is a moving target. This is clearly early days. Again, as I said, a lot of the heritability for common diseases hasn't been discovered yet. So whatever you can get out of these analyses is going to need to be revised going forward as we learn more and more about what are the actual major heritable risks for disease. This is just the first view of this. At the same time, you will hear about studies during the course of today that indicate that people who are given the chance to find out this information do, in fact, seem to have interest in it and do, in fact, in at least some instances have the ability to utilize it positively for modifying their own health behaviors and that they are not confused, for the most part, about the nature of this information. They don't necessarily see it as deterministic and it's not. It's predisposing. So there will be, no doubt, a rather intense discussion over the course of the coming months about the degree to which this kind of direct-to-consumer marketing needs more oversight than it does. But I think it has, at the very least, at least raised public consciousness about the opportunities that are there and at least for the early adopters, given people a chance to begin to think about how to use this information for their own preventive medical care. And this is what personalized medicine, ultimately, needs to be about. What your risks are, modifying your behaviors accordingly in order to optimize your chances of staying healthy. A very important thing that happened in terms of making sure that this kind of information won't get used against people was the signing after 12 years of hard work up and down and often much in the way of progress followed by disappointment, but ultimately the signing of the Genetic Information Nondiscrimination Act, May 21st, 2008, which protects all of us against use of predictive genetic information that would be discriminatory. It will not allow that in health insurance or in the workplace to do so now has become criminalized. And that is a very good thing in terms of providing safety for people who are interested in having the information. Worthy of note, however, is that Gina doesn't prevent the use of this information in discriminatory ways in some other settings, such as life insurance, long-term care or disability, so people who are contemplating finding out information about themselves might want to be aware of those consequences that could happen. Another area that I think is growing rapidly and which hasn't received perhaps as much attention as it should is the way in which the studies of the genome have also pointed to variations that play a role in drug metabolism and drug response, and this is pharmacogenomics, a particularly cogent example is the case of Plavix, which is the second most prescribed drug right now in the U.S., clopidogrel, after statins. This is a drug that prevents clots after a heart attack or a stroke. This is commonly used to prevent a recurrence. But it actually doesn't work for about 30% of the population, which has recently been realized. So what's that about? Well, it turns out that Plavix or clopidogrel is not actually the active form of the drug. It has to be converted by an enzyme in the liver in order to become active. Otherwise, it does you no good. And it seems that variations in a particular gene which codes for an enzyme in the liver called CYP2C19 are common in the population, and if you have a not very active form of CYP2C19, your Plavix dose isn't going to help you because it is not going to get converted to the active form, so you may as well not be taking it. And FDA, after reviewing this data, has gone as far as to add a black box warning on the label telling prescribers that they should be aware of this and that ideally individuals who are getting Plavix should have a genotype done to see whether they are going to be in the non-responder category because if so, you might want to choose a different drug. A different drug isn't harmful if you have that genotype. It just isn't helping you. There are other examples where the drug can actually be harmful. A back of ear. A drug given for HIV. About six or seven percent of people who get a back of ear will have a dramatic and even potentially fatal hypersensitivity reaction, and we now know exactly what that's about and it's possible to predict that response. And FDA has now recommended that nobody take tests to see if they're susceptible to that hypersensitivity reaction. Now, one thing that I wanted to mention, because there's been much up and down about all these discoveries about genetic variations that are associated with diabetes and heart disease and all these common diseases, I showed you that diagram building up over the years of now hundreds of these, and yet it is the case that most of them are modest in their individual contributions to risk and now we didn't really learn anything very useful because the predictability from these tests is pretty modest. Well, that's true at the moment, although, again, I think the predictability is going to get better in the next three or four years. But one thing we should not say is that these discoveries are irrelevant in terms of what they're teaching us about mechanism or what they're teaching us about possible therapeutics. Because if you find a variation in a gene, even if the contribution of that variation is modest, if it's clearly validated as a risk factor for disease, that tells you that gene is important to the pathogenesis of that illness. So I would say basically a modest odds ratio is probably not really relevant to whether that's a good drug target or not. As long as you're sure the answer is right. Another common misconception is that if you developed a drug against that particular target, you can only work for the people who have the version of the gene that increases risk. Let me show you an example of why I think this can't be right. My own lab works on type 2 diabetes in collaboration with other labs. We're now studying more than 100,000 patients with diabetes and controls. But the first nine gene discoveries by genome-wide association studies for type 2 diabetes are the nine that you list here, and they happen to be about nine of the strongest ones. These are genes that we, for the most part, did not expect to find in this search. They were surprises. They weren't on the candidate gene list. Although there are a couple exceptions. One of the exceptions is KCNJ-11. So what's that? Turns out that is the target for the drug class called sulfonylureas. So that metformin, for instance. This is one of the major drug targets that we already knew about for the treatment of diabetes, adult onset diabetes. PPA-argama pops up on this list. Well, that's the drug target for the other major class of current oral agents for diabetes, thiozolidine diomes. Avandia is a thiozolidine diome. So what do you know? You search the whole genome with a genome-wide association study, you get nine hits and two of them turn out to be the known drug targets for type 2 diabetes. And I can tell you that these drugs that hit those targets don't just work for people with the risk version of those genes, they work for everybody. So there's a lesson here. Probably somewhere on that list are some other drug targets for diabetes that we didn't already know about. And people are now pursuing those. And actually this list of variants that we know are associated with type 2 diabetes is about to grow in the next month or two, up to almost 40. So on that picture I showed you of those hundreds of colored circles decorating the genome of targets, of what are now targets for drug development, there's a lot of opportunity there. Because these are targets that are essentially validated in humans. That's where the study was done. So I think this is pretty exciting. But it also means there's more targets now than there are necessarily companies ready to start working on them. A lot of these targets are of uncertain value. Many of them are considered maybe questionably druggable. So there's a lot of work to do to get into the therapeutic side of this. You'll be hearing about this this afternoon. But I think what many people have not realized is that over the course of the last 5 or 6 years NIH has been preparing for this by investing resources into capabilities for academic investigators to empower them to get into the drug development business. Not turning NIH into a drug company, but basically providing resources and technologies so that academic investigators can begin to contribute to the earlier stages of drug development. And that means putting together in something called the molecular libraries initiative, the capability for an investigator who's just made a basic science discovery, maybe discovered a possible drug target to develop an assay that then can be run through a robotic system, which is this one you see here up in Rockville, that will allow you to screen that against hundreds of thousands of chemical compounds. You can think of them basically as molecular shapes to see which of those might actually have activity against that target. And the centralized facility will then also provide you with some medicinal chemistry to do some optimization, end up with something that's even better. All this data goes into a public database and effectively now hundreds of new compounds have been identified that have promised here in terms of therapeutics and would never have been pursued had it not been for this NIH supported effort. And hundreds of investigators who never thought of themselves as getting into this kind of science are now quite happily doing so. What we are trying to do now is to try to see how to optimize this pathway from developing information about a target all the way through to an FDA approval in collaboration with the private sector but with NIH playing a more substantial role particularly in the earlier stages. Assay development, high throughput screening, medicinal chemistry optimization, animal testing, toxicology ultimately asking the FDA for permission to conduct clinical trials and ultimately seeking approval with all of these components in here to help. And a new component which was just authorized by the healthcare reform bill is something called the Cures Acceleration Network which if the appropriators decide to also allow dollars to be spent upon this which we are waiting to find out for FY 11 will further accelerate our abilities to put investments into drug development. You'll be hearing I'm sure more about that this afternoon from Chris Austin and Bill Galt. So just to wind up here because I've gone on a bit as I tend to do. Yes, I'm getting the shut it down sign here. I thought it would be interesting to actually go back since we are here thinking about the last 10 years and say well what were the predictions that were made in 2000 when the draft sequence was being announced and because of the way in which you can save your own power points I can actually go back and pull up exactly what I said and obviously this could be embarrassing because prediction is difficult especially about the future who said that well somebody thinks it's Yogi Berra somebody else said it was Dan Quayle it was actually Niels Bohr the physicist who originally said this but if you go to the web you can find citations of no less than 29 people who were claimed to have said this and now I suppose I will get added to the list. So I said at the time of the draft sequence what will we know by 2010 you can see the fonts and everything are just looking very dated predictive genetic tests available for a dozen conditions that's what I was predicting by this year well we're clearly wildly beyond that whether they're actually very good or not is another question but we certainly have predictive genetic tests as you saw from all those direct consumer companies that have sprung up available for several of these well sure for cancers for instance for diabetes we do have interventions that we know work many primary care providers begin to practice genetic medicine well to find many I think we're still in trouble here because most have not yet embraced this opportunity a pre-implantation diagnosis PGD widely available limits being fiercely debated well that certainly happened this approach that allows a couple to screen embryos to choose the ones that they think are going to be most ideal originally developed for diseases like Tay-Sachs and now being more and more utilized for milder conditions and certainly lots of limits being debated happily and I was not sure in 2000 this would happen but this did happen Gina passed in 2008 so that one was succeeded and then access remains inequitable especially in the developing world well we've made progress with healthcare reform this year in this country but certainly access is going to continue to be a problem just to really put myself on the line I also made a prediction about 2020 the gene based designer drugs would come on the market in another 10 years again with all those targets we've discovered that's a fairly good chance that this will be true the cancer therapy will be precisely targeted to molecular fingerprint of the tumor that one's clearly going to happen before 2020 the pharmacogenomic approach standard practice for many drugs well we're pretty far along with a few like clavix and abacavir already and more to come mental illness we have still not really achieved this certainly in 2010 we know there are heritable factors and there are relatively few of them have turned up and homologous recombination technology suggests germline gene therapy can be safe well I don't know I would we'll have to see going on another 10 years on the limb 2030 and again this prediction made in 2000 not made today comprehensive genomics based health care norm individualized preventive medicine available environmental factors and their interaction with genotype pinpointed for many diseases illnesses detected early by molecular surveillance gene therapy and gene based drug therapy available for many diseases well I hope that will all be true even before 2030 a full computer model of a human cell will replace many lab experiments an interesting challenge average lifespan reaches 90 years stressing prior socio-economic norms only if we solve the obesity problem well that come true major anti-technology movements active in the US a rebellion against all of this and a serious debate underway about humans possibly taking charge of their own evolution well I think we got there a little before 2030 so perhaps our future is of genomic medicine will involve a broader use of DNA sequence hopefully not in a hard copy the electronic medical record had better come along and rescue these people but hopefully we'll also have the knowledge and even the wisdom about what we can learn from this so that we won't have the physicians going I don't know what do you think will actually be empowered by the rational realization of what we all hope to have come true the rationalizing of an evidence-based medical system based upon an understanding of the genome so I'll stop there and let's have some discussion thank you very much how important is the concept of gene modifiers and at this point how well do you think we understand their role I'm sorry Mark Johnson from the Milwaukee Journal Sentinel so gene modifiers traditionally referred to as what are the stages of the genome that affect the severity of a particular condition and often are utilized when you're talking about a disease that's caused by a single gene so for instance cystic fibrosis clearly not everybody with cystic fibrosis has the same clinical course even people who have exactly the same mutations in the main genes CFTR may have very different pulmonary severity so what's that about you're looking for modifiers that turn out to affect that sickle cell anemia is a particularly interesting one right now if you have sickle cell anemia and you happen to have inherited your sickle mutation from ancestors in the Middle East say in Saudi Arabia you may have rather mild disease whereas you have sickle cell anemia and you inherited your sickle mutation from some West African source you'll probably have more severe disease so same mutation so what's that about well it turns out there are various modifiers of the sickle mutation including a very exciting newly discovered one called BCL11A and what they do is to determine how much fetal hemoglobin you make as an adult making fetal hemoglobin as an adult doesn't sound like something you'd want to do but if you have sickle cell disease or thalassemia it can be life saving because it basically compensates for that sickle mutation and so a modifier that allows you to do that discovering the modifiers for sickle cell disease has already activated a pretty exciting therapeutic project because while we haven't had much luck coming up with therapies that directly fix the consequence of the sickle mutation we could maybe come up with a therapy that would turn fetal hemoglobin back on now that we know about that modifier pathway so many people are interested in tracking down modifiers because they may actually be better drug targets than the primary mutation itself Jill Wexler with Pharmaceutical Executive Magazine you mentioned some innovations and development at NIH to work more with the private sector developing therapies and I'm wondering if you feel that the pharmaceutical industry and the biotechnology industry at this stage in their situation are investing the resources or have the resources or initiating the activities needed to partner with the biotech and pharmaceutical companies doing as far as making the most of this new collaborative opportunity within NIH it's a tough time in biotech and in pharmaceutical companies right now with in terms of the biotechnology community limited access to venture capital which has really been significantly hammered by the economy and many pharmaceutical companies are pulling back on R&D also because of their own financial concerns so I think everybody recognizes we need a new paradigm if we're going to have this next generation of new molecular entities that everybody dreams of we have to come up with a more effective way of having the academic and the private sector investigators work together and I think that is emerging in a very exciting way and my conversations with both biotech and pharma are the idea of a model where NIH does contribute more of the effort towards de-risking projects that otherwise would have just not seemed attractive on an economic basis if NIH can actually pursue a new drug target and get it to the point of looking rather promising and even into and through the valley of death to show that you have a compound that could be safe and effective in humans then initiatives that might not have gotten started handed off and again NIH's goal here would be to get something to the point of them saying is there a company out here that's interested in which case here you go let's license this out and see what can be done as far as therapeutics the other thing that would help a lot is if it were possible for compounds that have been studied already at great expense in biotech and pharma but abandoned along the way because of lack of efficacy and particular disease of where they were developed if those could be liberated and put into these libraries that are being screened now for new drug targets you can imagine what a bonus it would be if you hit one of those because there's already so much known and so much money has been spent and obviously that is a topic of great interest and I think there's a lot of interest in companies in that very model as long as they can retain their IP so I like the way in which this is shaping up but there's a lot of details that have to get worked out so the IP issues are complex are still being looked at in more detail now by our own patent lawyers and by those in industry I haven't seen a problem yet that looks like it could not be solved clearly IP is important for a compound that you've invested a lot in to and that you hope for a compound product and NIH understands that as well so if NIH for instance is going to pursue a new drug target and take a compound all the way through the valley of death NIH will at that point want to claim IP as well so that if it looks promising it can be licensed out to a company with the IP attached to it otherwise I think the company won't be very interested I'm Meredith Wadman with Nature I'm Meredith you mentioned the Cures Acceleration Network what Congress might do with that a two-part question the first is what would the network accomplish that isn't already being done by clinical and translational science awards and secondly if Congress doesn't provide additional money for the network would you fund it by taking money from somewhere else in the NIH budget so what the Cures Acceleration Network will do and this is all written into the health care reform legislation I'm sure you've read all 2700 pages of that bill so you've no doubt encountered it what can as it's being called does is provide NIH with some unusual and useful flexibilities so it allows the funding of very large grants that can be academic and private sector partnerships up to $15 million a year and with matching capabilities so that if there is a good model there for funds to be matched by the private company that can be done which is not easily done otherwise by NIH it also authorizes that some fraction of this money can be spent in the same way that DARPA operates in what's called flexible research authority which means that we could set up projects with project managers who would have a lot of flexibility in terms of bringing resources to a project when they were needed and killing projects when they need to be killed and that could be very useful for projects that kind of day to day aggressive management and we don't have that capability for this area at the moment in terms of what we would do if the appropriators don't come forward we can't do anything because the way the bill is written and this was I'm sure Arlen Specter's intention unless appropriations are put forward specifically for this project I as the NIH director I'm not allowed to use those authorities it has to be appropriated for explicitly and so we are waiting to see in FY11 what the House and Senate will decide to do yes Hi I'm Elaine Richmond of Richmond Associates from Baltimore we write a lot about eye and vision and neuroscience so this is especially relevant to what we do thanks for that very clear presentation I thought your graphics were excellent to real nice clear simplicity I noticed that most of your publications are with science and nature well those covers I showed were just because they were genomes that got sequenced that made the cover I could show you a lot of other covers as well but go ahead ok so my question was there's special relationship about getting the information out through widely respected publications and widely subscribed to publications that was one of my questions the other question is so this is reaching the scientific community how about reaching the public community so in terms of scientific publications we encourage grantees of the NIH and genomics or whatever to aim as high as they can and to get publications into the journals that have the broadest circulation the greatest prestige science and nature happen to be near the top of that list but other journals are also now publishing lots of papers about genomics in terms of the public outreach that is certainly a goal of all of the institutes at NIH the genome institute is by convening this today I think trying to encourage all of you who do reach out to the public to have contacts and information that might be useful even when you're not working against a 5pm deadline which I know could be a bit of a struggle for everybody for me as the NIH director one of the things I most hope to do is to improve our ability to communicate information to the public we have lots of outlets for that I particularly point to the National Library of Medicine to what they do with Medline Plus with the way in which public access and PubMed has made the primary literature available even to people who don't have a subscription all of that effort is a high priority for us for me personally I'm now a communicating editor to Parade Magazine which reaches 72 million people every Sunday if you read Parade a week ago you saw a piece that I wrote with Tony Fauci about HIV AIDS you'll see a piece about colon cancer coming up in the next couple of weeks anything we can do to try to get the word out there we are doing it but obviously it's tough right now I don't have to tell all of you with all of the pressures particularly on print media meaning that more and more newspapers are being squeezed in terms of science reporting we are I think finding it even harder to try to get stories out there in a way that allows an expert to write about them but we're aiming to help in every way we can and I'm sure Larry and other who are here would be glad to talk to any of you it breaks about ways we could do this job better when it comes to telling the story of genomics Hi Francis David Ewing Duncan author and writing for Fortune Magazine today I had a question about there still seems to be a fair amount of resistance especially among physicians and consumers to using genetic information especially predictive markers even at the level of medical education and I talk to psychiatrists all the time for instance they have not heard of the amplitude and they're prescribing drugs which probably their patients there's nothing happening there would you have anticipated since we're talking about 10 years ago would you have anticipated this type of resistance first of all I don't know if you agree there's resistance but would you have anticipated that and what does the research community the NIH do to to try to continue to in that resistance I'm afraid I would have and one of the reasons that I helped start the National Coalition for Health Professional Education and Genetics Mitch Pegg 10 years ago along with AMA and the American Nurses Association was the expectation this is going to be an uphill battle that most physicians don't have any formal training in genetics or very sketchy training in their medical school experience the genetics is still sort of seen as that sort of thing that goes on in the tertiary care medical center and is not relevant to daily practice of the average primary doc so there's a lot of barriers to overcome and frankly physicians are overwhelmed you've got to know this, you've got to know that you've got to know the other thing until they're completely convinced they're going to not have the time to put into this so what it is taking I think is for the logistics and the frequency of the interaction to reach the point where it becomes unstoppable the logistics are a real issue so for a physician who wants to write a prescription to a patient who's sitting in front of them the idea that you have to do a DNA test and the result to come back before you decide on the dose or the drug just doesn't work this is going to get solved when it becomes feasible for many of us to have our genomes sequenced in advance and placed in the medical record then it's click the mouse and find out whether that's the right drug or the right dose for that patient then pharmacogenomics instead of having to order an ampli-chip becomes a informatics exercise and I suspect together with the advent of the electronic medical record which really needs to happen in the next five years according to the Obama plan will get us into the place finally where that becomes just irresistible to the average practitioner and that's going to make a huge difference I think what will also help in the meantime is more and more patients coming in with their 23 and me printouts saying okay I had my DNA tested and here are the results tell me what I should do and physicians really don't like to be in the position of going I don't know more than once or twice a day and so if that starts to become more than norm I think that will also drive a determination to get up to speed and the good news is genetics is not that complicated with a little motivation with some good web-based case-based materials which Nitchpeg has already developed waiting there for that teachable moment I think most physicians can acquire these skills in the space of just a few hours of intentional efforts it does require a grasp of statistics but of course anybody who's giving advice about your cholesterol is using statistics whether they're calling it that or not so I'm guardedly optimistic that will sort of reach that point where it just becomes well of course and every practitioner finds that they really don't want to go forward without incorporating this that isn't quite there yet for most practitioners last question Hi I'm Sue Darcy from the Gracie it sounds like you're totally endorsing these direct to consumer genetic tests I mean you went ahead and took three yourself but I've been noticing there's a lot of resistance from federal agencies and like in the case of New York state agencies don't want to see these saints in drug stores you know they stopped one company from doing it a couple of weeks ago Walgreens and the pathway genetic story so what is your position on that and isn't there a downside to letting consumers just spit into a vial you know and what might be unsanitary conditions and then for a lab to take this sample and figure out where they're going to get cancer or whatever so you're right to challenge me on this and if I came across as sounding like I think this is the greatest thing then let me correct myself I think it's interesting I think it's an important development I think there are certainly people who have gone through this kind of testing and have found powering and have given them a chance to modify health behaviors accordingly in a way that maybe they should have done anyway but actually may result result in a better outcome but it is still a unregulated circumstance that we should all be a bit worried about and certainly I've been part of panels going back more than ten years that have looked at this issue and have urged FDA to take a more aggressive role in overseeing this kind of genetic testing which right now currently they do not regulate but of course the balance is what you're going to try to strive for you would like to be able to assure consumers that this kind of information is useful and valid but you don't want to squash an industry that's just beginning to get going at a time where it could be a critical part of our future in terms of personalized medicine I know the FDA is working hard to figure out what to do and I think one of my closer colleagues in the government is Peggy Hamburg and we have worked over the course of the past many months to think about ways to approach this and I think FDA is now going to take some positions on this that perhaps have not been possible in the past but I'm not able to tell you right now exactly what they're going to be I have not heard of an actual example of a harm that has come to somebody as a result of this there may have been such harms you can certainly see what harms might be there somebody who gets tested by 23andMe and is found to be by their testing at low risk for breast cancer might then decide well I don't need a mammogram anymore because this test said I'm at low risk and obviously the test is not appropriately used that way it's only sampling a small part of the heritability certainly somebody who gets tested by one of these companies and finds out there at high risk for Alzheimer's disease their homozygous for the E4 at ApoE increasing their risk by 15 fold that could have a pretty significant effect on you especially if you didn't have a counselor working through the information with you I don't know to what extent that that has happened we just haven't heard about it so you have to balance those hypothetical harms with what is still also somewhat hypothetical benefits in terms of people getting information that they want and using it in ways that will help them maintain their health I can't tell you exactly where the benefit and risk play out at the moment we don't have enough data you will hear this afternoon though some presentations that I think will be somewhat reassuring in terms of the ability of people in the general public to figure out what this information is and what it isn't people are actually a lot smarter than I think some of the critics have given credit for this particular industry all right thank you all very much thank you Francis it's a great way to start the day and to attempt to stay on schedule we are going to just keep moving right along our first of a series of panels is going to be given by four members of our division of extramural research I'm just going to introduce each one they're going to come up and give brief five minute presentations and I'm going to wait for questions until each of the four individuals have spoken then we'll have them up here and they'll take questions as a group so the first speaker in this initial panel is Mark Geyer who actually is the director of our extramural program and he will be discussing an overview of the human genome well I can get started while they're putting up the slides as I anticipated both of the previous speakers basically said what I was going to plan to say in fact last night I emailed Francis saying well Larry asked me sort of sounds like what I expect you're going to do can you give me a little better idea of what you're going to say and he wrote back and said well I'm going to do a romp through the genome for the last 20 years and I think that's precisely what he did so I'm going to throw out a little bit of what I was going to say and I want to start off by talking about two of my hobby horses about genomics one is that I think genomics has always suffered from appearing to be simpler than it is I think the concepts of what genomics is trying to do figure out this sequence of these strings of ACTs and Gs or assembling units into the whole like a jigsaw puzzle is pretty easy for people to understand the central dogma of molecular biology which we're still in the process of interpreting of information flow from DNA through RNA to protein is by this time I think pretty simple for people to understand and it leads to very much to the simplistic understanding that genes determine phenotypes and nothing else about it so in general it's been both a blessing and somewhat of a difficulty for that the goals of genomics have been so easy to understand at a high level but actually getting there and learning what the specifics are and doing the accomplishing what's been accomplished to date has been incredibly difficult and the I think the magnitude of the technological achievement has really yet to be adequately described in enormous amount along the way had to be learned about how to operationalize molecular biology with unprecedented attention to detail production strategies data quality data access and really a new mantra for academic research the problem that sequencing the human genome set out and the problem that still faces us with understanding the way in which the human genome works is so much bigger than any one laboratory could expect to address and expect to solve made it necessary for each group and each person these investigators interest for everybody else to be successful and that I think is one of the basic keys to why the genome project work people really bought into that approach to science the second point I want to make is thinking about genomics as large scale biology what to me what is most characteristic of genomics and what potentially distinguishes it from other disciplines within biology is that the goal is comprehensiveness I heard in the term comprehensiveness used a couple of times today it's always it's often almost always used as a synonym for complete we have the complete sequence of the human genome actually what what we mean by comprehensiveness at any time in the in the history of large scale biology is to determine everything that can be learned using existing technology so that when we announced in 2003 the finish quote unquote finished version of the human genome we were pretty careful to always try to use the term essentially finished which meant as good as we can do at the time and since then even though the finished genome was announced in 2003 since then there has been an active ongoing effort to improve it there is a group called the human genome reference consortium that acts as sort of a hub for anybody who is working with the quote unquote essentially finished human genome sequence and who finds what they think is an error or where they have may be able to fill a gap that we couldn't fill in 2003 can submit the information and the human genome reference consortium will evaluate it will do some testing experiments if necessary and update the reference sequence so even the finished sequence at this stage is a dynamic activity in the eight years or seven years since the finished genome was announced we've closed a couple of a couple of dozen formally unspanned gaps in the genome sequence that there have been more than 150 reported problems in earlier versions that have been resolved and importantly alternative low side for some particularly complex regions of the genome like the major histone compatibility locus have been have been addressed so all I am going to end up doing today in my five minutes is try and give you a snapshot of what the genome the human genome looks like today so it's still three billion bases there about I never actually got the exact number down to the one in nine one in three billion parts but in the current analysis indicates that there are just under 22,000 annotated genes annotated meaning genes that have been understood or predicted to the point where you want to be able to say that this is a gene of those 22,000 almost 19,000 are considered high quality annotations meaning that all of the people who are in the business of annotating genes agree that what this particular gene is from the first base to the last base everybody agrees on the structure of those genes so those are entirely reliable to the best of our understanding the additional 24 26 what is it 3200 the difference between those are genes that predicted their thought to be there may be some good experimental evidence but they haven't yet reached the stage of being undeniable those are the protein coding genes there are also 11,122 regions of DNA that by structural criteria almost look like genes they may be but they don't function and they may be either genes regions of DNA that had formerly been functional genes that have through mutational processes become deactivated or many other explanations but put them up there because they can make the counting of actual functional genes quite difficult in addition to the protein coding genes one of the developments in important developments in molecular biology over the last decade has been the identification of genes that make an RNA product that functions but functions in a way other than in coding proteins and there are in the current reference version 3,000 of those that are well annotated there may be several thousand more at least by the time we're finished understanding those so those are just some quick snapshots numbers of what the situation is today the other aspect of the human genome and the genome is actually of many, many organisms is that the technologies that we now have available are allowing us to analyze and determine well the structural variation that is regions of the genome more than one base and often up to several thousands of bases that differ from the reference genome as a unit these can be regions where in one individual there's an insertion of a large number of bases at a particular point or a deletion of some section of the genome is inheritable they can be translocations where sequence has actually moved from one chromosome to another and so the same sequence appears on two different chromosomes in different individuals there are inversions where a sequence in the genome has reversed its orientation and there are copy number variations where a region of the genome is repeated more than once up to many hundreds of times in the past decade as I said there's been an increasing recognition that structural variation plays a very significant role in determining the genetics the genetic basis of human phenotypes including disease and this is an area that we're still in the process of trying to figure out how to represent in the in the reference genome so in attempt to keep to the time limit and because most of what else I was going to say is going to be covered by my colleagues I'll finish here I just want to say that as an illustration of the dynamic situation that we're in right now as you've heard many many times we can now sequence a human genome for somewhere between probably $10 and $20,000 but at the same time it's important to realize that none of those individual sequences yet really match the quality of the existing reference sequence even the reference sequence is still missing some things and that one of the goals as you'll hear about later is to really get to the point where for that $10 or $20,000 or even $1,000 we can produce a genome sequence that is of the same high quality as what we know in the reference now okay so next speaker also from our division of extramural research is Adam Felseveld who will talk about discovering what's important compared to genomics and evolution, the 10k genomes project and so forth Larry gave me a fairly wide remit to cover in 5 minutes or 10 minutes so I can't cover too many details but I'll try to fit some in as I hope I can give you some perspective on this and to make matters worse when I was thinking about this into the idea of why we sequence organisms other than ourselves is my slide on here I wasn't going to show any slides but something that Francis said and you think I should sorry every time I use a different computer this is just for background this is just to give you an idea of the sort of number and distribution of multicellular organisms animals that have been sequenced this doesn't obviously include plants it doesn't include protists it doesn't include large ensembles of very slightly different strains of yeast for example but it is to give you an idea it's a complex slide I had a simpler version but I couldn't find the simpler version today and it's a little out of date but I can run it down so it's easy to understand why NHGRI and NIH should be sequencing the genomes of laboratory models they add the genome sequences at tremendous value if you want to understand all the biology of that organism and for those reasons we sequence mouse and rat and fruit fly and nematode and yeast and lots of others ferret for example model for respiratory disease for basic research I would venture to say that you its genome is indispensable these days if you really want to understand an animal but what about other organisms so what about why platypus or elephant or elephant shark and they're all fine animals but what does this have to do with understanding human biology and the short answer lies in understanding comparative genomics and what you gain from comparative genomics analysis has made these and many other organisms relevant to understanding human biology in a way I think that was only very long term or theoretical or a little bit intangible before so it's a really simple idea comparing two genome sequences looking for what's changed or what hasn't that you expected to change or not change and correlating with phenotype in its widest interpretation it's everything from comparing the genomes of two different animals to see what's changed and not changed for example the genomes of a tumor and healthy tissue which Brad Ozenberger will talk about later I'm going to talk about organisms there's another way to ask the question is how do we know what's important in the genome so for simple genomes just a little digression for simple genomes like bacterial and viral genomes they're very simple and they're packed the genes are packed together the information is packed together but for more complex genomes the genomes are much sparser the information the interesting bits are spread out at least the bits that are interesting at first blush are spread out and they're also lumpy but lumpy I mean anything that you can ask of the genome is not distributed evenly nothing the genes aren't distributed evenly there are places where there are gene deserts the rates of change aren't distributed evenly throughout a genome and so on so how do we know about all this well you hear a little bit more about the experimental side from Elise but one way we can tell again is comparative analysis it's still one of the most useful ways to understand these issues to understand what's functional it's a pretty simple idea in comparing genomes with multiple species the principle is as Eric said before that evolution has actually done the experiment done all the hard work for us by changing the genomes changing all the genome at a certain rate and there's expectations for those rates how related the species are and you can ask two kinds of questions you can ask what hasn't changed and it's presumably hasn't changed because it's useful it's functional and I'll tell you a little bit more about this later you can also ask what has changed a little bit faster more much faster than you expected it to these comparisons are done in the context of sort of the way a lot of people think about them is over how much time you're comparing them genes across long distances so hundreds of millions of years to billions of years so us comparing our genome to the genomes of fish or multicellular organisms to those that are single cell and ask about genome features the change typically the change more slowly so you can look at things like the addition of new genes or how important protein domains functional domains came into existence you can ask more general questions like what are the genes that distinguish multicellular lifestyle from lifestyle the single cell for example you might expect that signal transduction genes those involved in cell cell communication are going to be more present are going to be more involved in multicellular life or underscored in multicellular life what are the genes underlying different evolutionary adaptations from creatures for example from creatures without an adaptive immune system that's the mammals versus sharks comparison from organisms with a head to those organisms that don't have a head what are the genes involved that came into being and coincident with the development of certain kind of nervous system organization so that was long distances across the middle distances so for example humans compared to other mammals in the range of many tens to probably low hundreds of millions of years divergence can tell us what's conserved to be functional what's changing more slowly against the background of steady long term change that is embodied in looking across many mammalian genomes so from this kind of analysis that you begin to get a handle on genes on intron X on boundaries on regulatory sequences nature simply changed everything else and when you see these in a comparative analysis they come up experimentally they give you essentially hypotheses across the entire genome for what's important in fact this is really the primary line of evidence that about, when you hear people say that about 5% of the human genome is conserved among mammals and presumably has shared important functions in the conventional sense there's a caveat here there's stuff that isn't conserved that's probably functional as well but it can tell the elaborations of what's really functional it's an extended discussion but with sufficient species you can get quite good resolution and quality with the roughly 25 mammals or more that are now done to at least draft quality you can theoretically get 6 to 8 base pair resolution that's a nice resolution to have for an element that hasn't changed through evolution because it's about the size of a bit of regulatory sequence that's the kinds of things you want to look at when you're trying to understand how genes work there are elaborations on this not just when you're looking across genomes you can look at more than what's just conserved you can also look at sort of patterns of how things change so for example we all know about codons 3 base codons if you see regions that are changing today you're sure or pretty sure that that's a signature of being in a coding region so that's middle distances what about shorter distances so less than 10 million years diverged million to 5 million maybe even less than 10 million you can ask about very rapidly changing features of the genome it's hard to ask about what's preserved simply because there hasn't been enough time quite a lot is preserved by accident and but you can ask about what's changing more rapidly and those comparisons for example us to chimpanzees and recently to Neanderthal reveals a lot of interesting regions that are either in us but not in our immediate immediate cousins immediate neighbors relatives and also allows you to ask what's changing very rapidly and you probably all mentioned Neanderthal paper it showed about 80 different non-synonymous fixed non-synonymous differences in genes in about another 200 regions that appear to have general rapid evolution and lots of interesting suggestions speculations on what those functions are including between us and Neanderthal skin development and bone morphology and possibly cognition these make really tempting targets for speculation and further work about what the basis of the different phenotypic differences is so Larry signaling me that I've got to wrap it up so we use comparative genomics just in summary to ask what's functional how things change over time we spent the last 10 years doing a lot of this a little bit less in the last few years mainly because we wanted to take advantage of the opportunities that the technology afforded in being able to sequence lots and lots of humans and doing a lot of medical sequencing but it's still important comparative genomics across organisms still important there's lots of questions that can be addressed for example what's the genomics basis and mosquitoes between disease vectors those that vector disease and those that don't in malaria for example that's an active ongoing project another kind of comparative sequencing for example looking at a pathogen whose genome and hence its presentation to the human immune system varies quite a lot can you sequence thousands of isolates find regions that are not changing in all those thousands and use that as a vaccine target that's another kind of comparative approach and many other questions and I think I'll just stop there for those of you who hadn't added these up this is about 70 70 metazoan sequences that are done and about another 90 that are in progress haven't quite started yet and this is just NHGRI and a few others there are another 100 on the docket next year from BGI and we'll have more as well and so will others and there's some that I can't even track I'll stop there next speaker is Lisa Feingold talking about more than the genes controlling the genome thanks Eric so Adam really set this up very well he mentioned I'm going to talk about more of the experimental efforts to identify functional elements in the human genome and try to begin to understand how genes are regulated and so the big question we've been talking about is how do we read the human genome sequence there's no instruction manuals we really don't initially know very much about the different punctuation marks and as Adam discussed evolutionary conservation can help us to identify functionally important regions about 5% of the human genome sequence is highly conserved about 1.5% in codes for genes and computational approaches are not really good at helping us identify where the coding sequences are but they're not very good at really understanding fine gene structure what are all the different alternative splice variants so this is really a big challenge even great challenges identifying the regulatory regions many of them are very far away from genes and we don't even know which regulatory regions interact with which genes and so we wanted to take an biased experimental approach to identify all the functional regions and Francis I think set this up earlier today too talking about one project in this direction is the encode project for encyclopedia of DNA elements the goal of the project is to compile a comprehensive encyclopedia catalog of all the sequence features in the human genome and in the genomes of selected model organisms much as we did with the human genome sequencing we started out a project we focused on 1% of the human genome sequence which is about 30 megabases and it started back in 2003 and then in 2007 we expanded this production effort across the whole human genome sequence also like what we did for sequencing we are studying some model organisms the elegans and Drosophila we initiated these projects in 2007 and trying to identify all the functional elements in those genomes the advantage is being that these are much smaller genomes they are less complex there is a lot of biology known about them and we can do genetic manipulation to really test our hypothesis for the function of the various elements this past year with the economic stimulus our funding we initiated a small effort in mouse encode we have limited production efforts to identify functional elements in the mouse really using that to see how that will help us in identifying functional elements in the human genome and the last component is technology development we've had a number of efforts in that area and I'm sure we'll be doing that again in the future for not only identifying new functional elements but having better methods for identifying functional elements that we already know about from functional elements that we're studying and the middle gray area describes the technologies and I'll just say that most of those had been based in the pilot project with array based and almost exclusively now these have been replaced with sequencing to really give us very high accuracy and very high resolution in identifying the functional elements do we say we don't have a pointer here great effort are done to find gene structures this is a lot of actually involving manual curation we're looking at RNA transcripts both coding and non-coding we're looking at cis regulatory elements such as promoters and transcription factor binding sites as well as long range regulatory elements such as enhancers repressors and silencers, insulators we're also using DNA suggestion to look at DNA hypersensitive sites and this is really what it's doing is identifying areas of open chromatin presumably where DNA binding proteins bind we're looking at epigenetic modifications such as DNA methylation histone acetylation I think we're going to hear more about that in the next session as well as some limited studies and looking at DNA replication this is a slide from about a year ago showing that there's lots of data and lots of different data types this is a screenshot from the UCC browser where all of our data is going up and we included all the different data types that would go well down I don't know if we can go any further down in this building but it would go quite down maybe to the metro and really put up this slide to remind me to tell you that these projects are being done by large research consortia and one of the great advantages of these research consortia are that they are focused on many common cell types and using common data standards and formats is really facilitating the analysis of the data so where are we now these projects are really in their large scale data production efforts the encode and monocode projects both have about a thousand data sets that have been submitted it's really been quite challenging to analyze the data we've had to develop common data reporting formats, data standards and analytical tools especially when these production efforts started in 2007 is really just when we were beginning to use these new hyper sequencing technologies and a lot of the analytical tools had to be developed from scratch there are integrative analyses that are ongoing with long-term plans right now they're ongoing for each individual organism with long-term plans to do integrative analysis across all of these genomes I mentioned fly or worm, fly, worm and human and I forgot to add mouse to that and we'll be following up on and expanding on some of the findings in the pilot project one being that the human genome is pervasively transcribed and another being that as I think Adam mentioned this many functional elements seem to be unconstrained in evolution and we want to look beyond what just the conserved sequences are to identify functional elements so where are we right now so in the pilot project as I told you we were focusing on 30 megabases and about 5% of those sequences were constrained the end of the pilot project was about 3 years ago we had identified assigned function to about 60% of the these conserved sequences and about 40% remained annotated and now in the expansion to the production phase we've about have the amount that's unannotated about 20% of the sequences are unannotated today in part that's because we've expanded the number of cell types that we're looking at as mentioned earlier functional elements aren't functional in every single teletype so you have to really study a broad number of cells to be able to find different functional elements and if you extrapolate this out now to the whole human genome we're focused here just on the encode regions but extrapolating out to the whole human genome we've identified function to about 67% of the conserved conserved sequences so the question then is how will these catalogues of functional elements be used they're going to greatly enhance our understanding of gene regulation and this is really on a spatial temporal and quantitative level so we're not only discussing something that's on or off but really how much is being expressed we want to know who all the different players are how do they interact where they're expressed how do the variants affect gene expression and ultimately really can we predict gene expression from sequence can we figure out the grammar rules that we can look at a sequence and then predict the sequence and then ultimately really then how can we manipulate gene expression we're also going to use these catalogues to enhance our understanding of the genetic spaces of disease and you're going to hear a little bit more about this in the next session but many of the genome wide association study hits find SNPs in non-coder regions and so we want to know how these SNPs or disease mutations in non-coding regions alter gene expression and can contribute to disease it's also going to enhance our understanding of epigenetic contributions to disease we're studying several epigenetic marks and encode and there's a sister project Epigenomics that formerly called roadmap now the NIH comment fund that's really focused on identifying these epigenetic marks and with an emphasis focusing on how you can use them in disease studies I'm just going to I'm probably running out of time so I'm going to just run through this really quickly just get an idea of where things are going in the future there's two papers that were just published in science in April that studied the variation of transcription factor binding sites and in chromatin structure in individuals and so we're now first we started with the catalog of just what are the function elements now we're applying this to understanding the variation in individuals and how that can influence changes in gene expression and then how also can you make links from GWAS hits function and ultimately disease and this is just referring to a paper that was published last year where this group had identified four regions in the AQ 24 region having alleles predisposing to many different cancers including prostate, breast and colon and the regions were really far from annotated genes they were unknown biological function what this group did was to generate lots of different data types very similar to the data types being generated by encode including RNA expression, histone modifications binding sites for RNA polymerase and the transcription factor for androgen receptor and they identified several enhancers and they found a snip in one enhancer that was where the transcription factor bound and they found that the prostate cancer risk allele was actually still saying stronger binding of that transcription factor and stronger androgen response so this is just really a glimpse at how these catalogs are going to be used and how they apply to studying function and disease. Last member of this panel is Jane Peterson talking about basic research on the genome human microbiome project and knockout mouse project. Larry asked me to cover two projects the knockout mouse project and its extension and then also the human microbiome project Francis did tell you a little bit about the knockout mouse project which I'll refer to as comp so I can probably go through some of these slides relatively quickly. I'm sure many of you have heard and reported on the fact that the mouse is a very important biomedical research model it is really the only mammalian model for looking at human disease that has the good genetics that's been done on it and has all that all these features that have been developed in the last 50 years that make it a very tractable system for looking at human disease. It's of course not perfect but it is the least expensive and the most functional model that we have right now and importantly for this project in 2007 the Nobel Prize was given for this discovery where the introducing specific gene modifications into the ES cells was given and of course these ES cells can then be made into mice and the mice can then be studied allowing you to look at the function of that gene on what we call the phenotype. So Francis showed you this triangle and this is the vision for the comp project as you all know we sequence the mouse genome early in the human genome project the next step was then to develop ES cells from 2006 to 2011 so what we've been doing in that time period is knocking out or rendering null that means that we insert something into each gene in the mouse genome and render it so that it doesn't work anymore and then make these ES cells and as you can see the goal over here was 17,000 genes would be knocked out and Mark told you that there's 18,000 that are well annotated in human in mouse the number is less and even 17,000 maybe hard to get to you can see that the progress here has been very good we're on track for making our goal these are the US participants and these are the international participants in this project and we expect to complete the US part by the end of 2011 and the European parts a little bit later so once we have all the genes knocked out or even before we have all the genes knocked out we can start working on making these ES cells into mice we received a supplement from the stimulus funds last year and so I've already started making mice from the ES cells and then following on from that we want to start looking at how the gene expression in these mice are changed compared to a normal and then start doing phenotyping that is start running these mice through batteries of tests that look at hearing for example look at neuro responses measure their metabolism and very often there are autopsies done when the mice are killed so this is what we call comp2 and that is a program that we'll be starting next year this gives you a little bit of an idea of the timeline this is knocking out the mice knocking out the gene part of the timeline the comp1 and here is comp2 we've put together $22 million a year for the next five years in order to make the mice and to start running them through a battery of different phenotyping tests the importance about this approach is that very often when investigators make a knockout mouse it's because they're studying a particular gene they expect it to be related to diabetes or to heart disease or something like that so they really only put it through the phenotyping tests for heart disease this approach will be will not be targeted like that it will be more general it'll be broad based phenotyping where every knockout mouse that is made will go through a broad battery of tests and it's already been shown that this very often turns up phenotypes that are completely unrelated to what the phenotype you might expect okay now we'll go on to the human microbiome project the human microbiome is very simply the bugs and the viruses the bacteria and the eukaryotic microbes that live in and on you this is a relatively unexplored area of human health we know that there are ten times more bacteria or microbes that live on you than the number of cells that you have in your body and yet it's almost completely unexplored so the microbiome project started about two years ago the goal is to catalog the microbes that inhabit the human body to examine whether changes in the microbiome can be related to health and disease and then to create a community resource that will enable this type of approach of research to go forward in the future it's a five-time five-year project $157 million and as Francis told you earlier for all these large community resource projects we require data and resource release and that's what's happening and here are the URLs for the project now this is another timeline that shows you that this is a very diverse initiative we the centers part of this initiative is looking at two important questions one is what is the complexity of the human microbiome in different body sites and we're looking at five different body sites and multiple sites within those that data is starting to come out now in the public domain they are also looking to see if we have a core microbiome that is do humans share a certain number of microorganisms that in common that define some basic need that we may have for microbiome activity or are we all unique then there's this demonstration project part of the initiative these projects are designed to look to see if we can show a correlation between changes in the microbiome and disease or health state there are 15 of these I'll show you what they are in a minute and we are actually right here today we're actually doing some of the review to decide how many of these 15 will actually ramp up to about 7 or 8 that will be larger scale projects looking at this this is really where I think you'll see a lot of interesting developments over the next couple years as we find that there are correlations that can be found showing causation is something else there's always a question of does the disease come first and the inflammation caused by the disease affect the microbiome in that site and trying to figure that out as to whether or not it's a change in the microbiome that causes the disease or vice versa is tricky so I'll show you these in a minute there's also computational tool development that's needed technology development particularly to isolate some of the bacteria or microbes in general that don't grow in labs and are very hard to get our hands on most of the microbes in the human body are not cultivable so that makes it very hard to study them the data analysis and coordination center which handles the data and then ethical, legal and social issues studies because there are a number of issues that arise when you start talking about looking at people's human microbiome these are the demonstration projects you can see they are pretty broad range there's psoriasis there's the virome and febrile illness obesity acne, Crohn's disease the uterial microbiome and adolescent males necrotizing enterocolitis enterocolitis enterocolitis, I always say it wrong this is a disease that neonates that is deadly this is esophageal adenocarcinoma vaginosis and other Crohn's disease ulcerative colitis abdominal pain and intestinal inflammation atopic dermatitis and immunodeficiency another vaginal one and other Crohn's disease one so most of them are based in the gut the rest are either skin, vagina or looking at viruses the core part of the project which I told you about at first is also looking at the oral cavity and unfortunately none of the demonstration projects looking at the oral cavity got funded so this is our website it's the HMPDACC.org I invite you to go there to get more information about it and I'll stop there time for questions please come up to a microphone and ask any of these panelists any questions you might have uh oh everybody looks stunned here we go my name is Bob Codd, I'm a freelance writer I've done work recently for NIDA and for the Drabkin Symposia series on medical genetics and my first question is to Dr. Geyer I talked about you basically listed roughly around 35,000 sequences which are somehow gene or gene-like the pseudo genomes and the mRNA coding sequences but what percentage of the total genome is covered by those 35,000 coding things and then I would also ask the different panel members to say something about what they think the rest of that stuff is it wasn't too long ago when the paradigm was that it was junk and I kind of want to know what you think was there thank you in term is this on in terms of the amount of the genome that's covered by the coding elements it's about one and a half percent and in terms of junk's DNA Sydney Brenner a long time ago made that's distinct between junk and garbage junk is what you keep in your attic because sometime you're going to want it and so I've always looked at that that the concept of junk DNA with that in mind but I think what you heard from Elise and I'll let her talk a little bit about it also is that it is clear that there is information mostly doing much of which has to do with gene regulation and expression in the non-coding elements how much of the rest of the genome is incorporates that we're not clear because we really don't understand very much of it at this point Elise do you want to I don't really have anything to add I think that we we do know that there are some functional elements that don't show very high levels of conservation I think as we get more and more mammalian genome sequence our notion of what's conserved is changing as well so I think we're really just beginning to learn about that I also want to add that beyond the parts of the genome that control gene expression which we've been talking about here is functional there are other parts of the genome like centromeres and telomeres that are important for the chromosome mechanics dividing chromosomes up amongst progeny cells that are quite important but we have kind of a spotty understanding of some of those as well just to give you an example of what you have to wrap your mind around human centromeres can vary from chromosome to chromosome quite a bit and even on the same chromosome depending on what you got from your mother or your father they could be millions of base pairs different in size but they're clearly functional if you don't have one you won't do very well and a chromosome but the important thing seems to be a sort of general pattern of repeats rather than any particular length or any particular sequence so what's functional about that if you can get rid of you know if you can get rid of 3 or 4 million base pairs of a centromere and still function perfectly well as long as you've got another million left what's really functional so you have to there are subtleties here keep in mind I mean and Adam implied it but just keep in mind the focus of the discussion of what you heard from some of these speakers related to the function defined at the primary sequence level but there is almost certain we'll probably begin to understand it better in the future another code of the genome that relates to three dimensional structure which might partially it'll relate to the primary order of the letters but there might be more than one way to get that structure I mean different sequences might be able to adopt the same three dimensional structure probably what's going on in the centromere and there is some early evidence of elsewhere where it's going on where three dimensional structures playing a role even though two sequences might be very different they might function a similar way so Eric can I add one thing very interestingly though I remember several years ago someone at LDNL deleted a megabase of a genome from a mouse and said they could not find any phenotype now maybe the right phenotypic screen hadn't been done nonetheless there are parts of the genome that we're going to have a very hard time figuring out and there are also big differences at least what maybe ten years ago we would have thought of big differences even between individuals there are probably tens of genes in some of you that are functional that in me are not functional and vice versa there are in Mark mentioned the human reference consortium there are they were looking at they try to represent genes that are functional so they can represent lots of different things in the reference and you think you'd want to represent two things if you can represent anything in what you call a reference one would be if it's a gene it should be the functional version of it and the other is that it should be something that's around in a lot of the population but it turns out that there are cases where most of us don't have a functional copy of the gene only a few percent of the populations that have been looked at actually have a functional copy so there's weird stuff like that going on Steve Sternberg USA today Dr. Peterson did I hear you say correctly that the oral microbiome isn't going to be funded and if that's correct what I said was that there's not one in the demonstration projects the more basic large scale project that's being done to look at how many microbes are there and if there's a core microbiome the oral cavity is being well sampled for that and it's also true that the Dental Institute is funding a number of projects there's only that in the one particular call for projects none of the ones that were proposing projects were evaluated peer by peer review well enough to fund okay we're going to move on let me thank these four panelists we will shift gears and the second panel will discuss genetic variation why doesn't matter we will have three speakers the first from NHRI and then two outside speakers who came down from Johns Hopkins to join us Lisa Brooks who will be talking about HATMAP and the thousand genome project thank you I oversee the program in genetic variation research at NHGRI I also oversaw the HATMAP and along with Adam Felsenfeld and Gene McEwen I'm overseeing the thousand genomes project why do we care about genetic variation there are several reasons we're here today because of the wonderful achievement of the human genome but at any point in the human genome it was just one person's sequence but genetic differences among people is what gives rise to differences that we all have and differences in risk of disease and differences in response to treatment so that's something that NIH cares about a lot another reason is that genetic variation itself can be used to map the variants that affect disease that affect disease and studying finding the variants that actually contribute to disease lets us then study them to understand the mechanism of how they work which means we can then try to intervene to prevent or to have treatments so the types of variants Francis has already discussed SNP here we have the same region of a part of a chromosome so say it's like the left end of chromosome one we have this one example of this chromosome from three different people so this is basically this is the same region in different people three different people one chromosome from each person and so there are certain sites the SNPs which are the single base pair differences single base pair spelling differences where some individuals say have a C other individuals have a T another SNP here these are the most common types of variants among individuals there's other types of variants such as what we call them indels insertions deletions where individuals some and Adam referred to this in some of the cases of people having like some different sets of genes that there are can be something as short as this like one or two base pair insertions or deletions up to much larger regions there's also other types of variants where pieces of chromosome are in different places in the genome but they're not as common so how much variation is there chromosomes from any two people differ about five in a thousand sites when you look over a lot of the genome including insertions deletions so we're about 99.5% the same so a thousand genome status just beginning to show so these are very very rough estimates there's about more than 30 million variable sites in the DNA per population and probably worldwide more than about 80 million so what's the half the type so again here's a part of chromosome the dots show the parts of the chromosome the sequence that's the same in all individuals and so I'm just showing you the variation and so here are six SNPs so this particular chromosome has these particular SNP variants on it and another chromosome has another set of SNP variants so with six variants six sites and two possibilities at each site that's two to the six there's 64 possible chromosomes here but in fact in the genome there's really only a handful in most places in the genome so this we're a young species we haven't had enough time to have a lot of recombination mix things up so this was really the basis for the hat map that basically there's regions of the genome where there historically has not been much recombination so that you have sets of variants that are highly associated with each other what that means is if you're trying to figure out what the variants are in somebody you don't need to know every single variant just knowing some variants a handful in any particular region will give you the information on just about all the variation in those regions so those variants you can look at we call tag SNPs if you just look at this SNP and this SNP say you'll know the rest of the SNPs in that region so this allows a lot of efficiency so the major use of the hat map then was to show the pattern of variation across the entire genome and so here's examples in ten regions the red regions are places where you have bunches of SNPs that are highly associated with each other so in a region like this you don't need to choose many SNPs in order to really describe the variation in that region very well so this sort of pattern that the hat map shows this is what's the basis for letting chip micro-way manufacturers go across the genome in regions where there's these big red triangles you don't need many SNPs so you don't need to put many down in regions that have much less of the red triangle then you need to put a lot more SNPs down so since genotyping is somewhat expensive of course getting cheaper sequencing is much more expensive this was a way to make it much more efficient to scan across the genome but to know that we're being comprehensive and yet cost efficient is possible so one of the major products of the hat map then was this structure of pattern variation to create chips which then any disease research you can use to study any disease they're interested in in genome-wide association studies that's the GWAS genome-wide association studies they take people with a disease people without a disease apply these chips across the whole genome and look for differences and what these differences are are at these tag SNPs and if you see a region of the genome where the frequencies of the tag SNPs are the same in people with a disease and without a disease there's no evidence there's a genetic contribution to a disease in that region on the other hand if you find a region where the differences in frequency are at the same time these are candidate regions where there's a variant that actually contributes to disease so once a GWAS study has been done there'll be a set of regions that people want to follow up on to say okay this region likely contains a variant that contributes to this disease so the question then is what else is in that region because this is just a sampling but you still want to know what all the variants are so that was the basis for the thousand genomes project which is basically looking for most variants in humans sequencing samples from 2,500 people from 27 populations around the world and it'll find almost all the variants in these samples that are frequency of 1% or higher so this is a very large fraction of human variation so this is how a GWAS study works in a very diagrammatic form you find regions that are associated with disease as I showed you before and so here's a region of the genome here's 2 genes exons and introns and here's the HATMAP data so it's fairly coarse across the region but the red boxes show that these SNPs are highly associated with each other and they're associated with the disease so then the question is what's all the variation in the region so you can see these three variants and here here are these variants again but based on either sequencing in the GWAS study itself or using the thousand genomes data which is much cheaper because you're not doing the sequencing the data is already in databases it's very quick it's pretty comprehensive and it's much finer resolution so you can see just about all the variants in this region so then the question becomes you've got a bunch of variants here they're associated with each other they're associated with the disease which variants are causal and which are just so long for the ride once you've done this sort of study that's about all you can go statistically in these studies at this point you have a set of variants the way the genome is organized with these blocks of association means you can't go more than that based on this sort of study what you really need to do then or functional studies Elise gave you some examples of those then you really have to look specifically at the variants themselves it's a different type of biology finding those variants is the easy part figuring out what what they do is harder because it becomes when you do a genome wide scan you'd have a chip you do the sequence thing you just kind of do it here you actually have to do real biology on each disease on each set of variants so you study the sets of variants experimentally to figure out which are the causal ones and that's just one short phrase but that's a huge amount of biology a lot of biology goes into that in order to understand their function genes with each other and genes with the environment so there's two types of things with genetic variation here you're using genetic variation as a tool in order to understand the biology and the disease process how does the disease process work by finding the variants that are involved in that and then mechanistically experimentally how does it actually work how does the variant the particular variant give rise to these or Alzheimer's so you're using variation to understand the biology and of course the other way of using genetic variation is for individuals we all have particular genetic variants that give us risk for various diseases so we can use this information built on the other information about how things function to try to aim to prevent, to diagnose and to treat disease so our next speaker there's Aravind who came down for the day from Johns Hopkins and he will be speaking on findings from GWAS studies in dark matter the missing heritability thank you for being here I think what I'm going to do is not show you slides and save myself one minute Larry gave me what five, six minutes he's going to wave when things have set enough I thought what I was going to do for you is give you a snapshot of what we got here but what we intend to do going forward with what we've learned over the last in some sense ten years but I'll focus largely on the last few years as a geneticist who's interested in studying human genes and how they impact on human disease this study has a very long history in fact the first time I don't even know that they were called human geneticists then anybody figured out that genes vary and their products vary and impact on physiology and phenotype was in the discovery of the most common blood group marker that all of you know which is the ABO blood group system was discovered in 1900 it's more than a century ago and for a long period of time before the human genome project it's not like we didn't know of genes we didn't know of variants and we couldn't connect them with human disease except we had no algorithm we had no crank we had no specific organized way of finding more variants and connecting them to human phenotypes in all of that period what we learned is yes there are genes and there are mutations or variants within these genes that impact on a whole variety of human traits in fact that was the major reasoning for mapping and sequencing the human genome but we knew that we missed a lot the question is beyond numbers that is more variants, more diseases and more phenotypes what did we miss the second thing I wanted to point out is that the role of genes and by genes here I mean variation in gene that is inter-individual variation that is the fact that my sequence is different from say Eric Green's sequence or any of yours the fact that this difference or these differences account for a significant portion on average about 50% of any human phenotype we pick could be height, could be propensity to some neurological disease could be some circulatory disorder or even some rare disorder like cystic fibrosis has been figured out for two kinds of evidence we normally talk about twins, identical twins being much more similar than fraternal twins but the crucial evidence for genes came from the fact by some kind of a titration effect if you study my first-degree relatives such as my siblings and then you go on to study say some second-degree relatives like my aunts and uncles third-degree relatives such as my cousins you will find that the correlation between these relatives for anything you measure falls off by 50% and this 50% fall off is in fact the hallmark of genes we know of no other biological process that gives us that 50% many other things fall off but this is in fact been the persistence evidence for genes so this is not to say that the environment and other factors are not important but they act in concept with genes to create who we are the third thing I want to mention is this idea of doing genome-wide scanning we've been doing this again for a while but not with the efficiency with which we've done it since the human genome sequence came about most of you will remember everybody argued for finding genetic markers and doing physical maps and genetic maps of the human genome long before we had the sequence and the idea was to make a lot of advance in mapping the positions of genes that caused for the most part rare disorders because we could collect families trace them within families and fewer markers did the job this of course led to the cloning and identification of I think the number is close to 4,000 entities and over this period if you remember the human genome project depending on the dates that you picked let's suppose we say 1988 or 1990 about 20 years we do know the molecular basis of 4,000 entities they don't all map to 4,000 genes a smaller number of genes but we now know in reverse if we started with a disorder or phenotype what its basis would be so I want you to remember that timeline but when we tried the same methods on the major chronic diseases of mankind the ones that affect the lives of most of us or our family members and friends this process didn't work and we had as geneticists a fairly good inkling as to why that would be the case but a hypothesis is never close enough to the kind of certainty of proof we would want and with the human genome sequence even the draft sequence there was talk and then there was discussion and inevitably among scientists there's some disagreement but nevertheless the community launched what's called the HapMap project and we started starting sometimes the dates blur I think it's 2002 2001 2002 you're right and the first fruits of the HapMap was finding the common kind of genetic markers no longer first attributed to function but attributed to the nature of the variant and in fact its frequency single nucleotide polymorphisms of which I think the HapMap will form million, these are all common markers and starting in about 2006 came the first results of using them using this information studies of human disease and this use required the existence of large cohorts of patients they had to be well feed a type to look at but more important than that it required a technology of what existed in databases to take it through these samples and families and this is the era of the genome wide association studies in fact by my counting it's only about begun and we have I think in the literature something close to a thousand sites in the genome that involve in various disorders for example like type of diabetes and atherosclerosis as well as many many other medically important traits as well as traits that are teaching us a lot about genetics and atherosclerosis height now most of the markers that we had in HapMap and you've heard this were not causal by themselves and the reason is the full million is only a small really a very small amount of the total amount of variation that exists in humans we've turned out to be we are less variable than for example many of our cousin species if you will the great apes but nevertheless we are much more variable than we ever thought and in this case in the GWAS studies what is it that we learned I'm going to start with the positive first we learned a huge amount of new biology a biology that we haven't even scratched the surface of we've clearly learned for example the ones that you've heard of the importance of the complement pathway in each related matter of degeneration we've learned quite convincingly that the fundamental problem in type 2 diabetes is rather insulin secretion the resistance does come about later and in many of the studies that we've done for example in blood pressure and as a trait and in hypertension as well as something that's been fascinating us over the last few years something that we never understood this entity called sudden cardiac death we now have in each of these traits at least 30 targets in which they're compelling genes that we need to study the most common drugs for hypertension for example are drugs that modulate what's called a renenangiotensin pathway and it's come as quite a surprise not any of the common variants that we know to date map to that locus so this is not to say that that information is not helpful but that there's much much more to learn sudden death was a sort of quite an interesting example I'll just take another minute or so that as we've had advances in medical knowledge and public health clearly the mortality from heart disease has reduced it's reduced remarkably over the last 25 years not the entire morbidity but clearly the mortality has decreased what's happened then is something else has come to prominence and there's something else is sudden cardiac death of which the arrhythmias are thought to be the major cause so far but there must be other sort of factors that lead to somebody who has none of the known cardiovascular risk factors that we know of suddenly meaning within 24 hours just literally drops that the many athletes in which case that have succumbed to sudden death in which case we know of some rare syndromes that have predisposed them or some other physiological stress but these are athletes they stress themselves many many many many times the question is why then is one of the factors we don't understand so sudden death I would say that before the era of genome-wide association studies was only understood in a handful of rare Mendelian syndromic cases and starting in 2006 till today many laboratories including ours have not only identified the first major gene major meaning having a significant impact through the ECGs through the QT interval in the ECG not only has this showed us a factor that affects the ECG and puts some part of the QT interval in the risk zone but we've shown that it affects sudden cardiac death in a major way and we've gone all the way to now have a mouse model now in four years that might seem to some of you to be a very long time but it's a very very short period in which we move from almost total ignorance to understanding at least a big fraction now a big fraction doesn't mean 100% we explained with about 30 of these low size something like 20% of the genetic variation we've got a long way to go and the reason why we think that we have a long way to go this whole idea of this missing is that as one uncovers one genetic hypotheses as scientists we always study hypotheses in a very specific way one at a time these are observational studies these are not studies in experimental systems we come as we come we volunteer for studies the way we do and so the first hypothesis that was possible to study with the knowledge and the technology that we had are these common variants but it is entirely true that they haven't explained the whole picture and I don't think it was a fair even expectation that it should have but the misinheritability is a problem but that doesn't mean that that problem doesn't have many adequate competing hypotheses and I will tell you at least my bias and I'm sure that Andy Feinberg is going to tell you of his bias they are hypotheses we need to test them and each is going to uncover for the details that we don't know the 1000 Genomes project which is in full swing now again I forget I think the first discussions in Cambridge for about three years ago was in fact launched with the idea that we need to not only go wider we need to study a much more extensive group of humans from more populations and more individuals per population but also look deeper into genetic variation not only common ones but even uncommon ones Lisa gave you a figure of about 30 millions that we expect per population Europeans that are somewhat less variable than other human populations our estimate we can quibble a little bit is of the order of 20 million or so 4 million or rather 5 million are above the frequency of 10% they are within reach today about another 5 million between 1 to 10% they are almost within reach that is they will be in the next 6 months but there's a substantial number that is far less and this is with sequencing and other technologies will come into being but the point that I want to leave you with is that the hidden imperitability problem is partly explained by much of genetic variation that we are only uncovering today and sometimes I know we get into esoteric debates about what's rare or what's common but in fact almost all of the studies are in the direction of finding a much bigger share of the genetic variation that exists in human populations and in developing technologies and methods for studying them in large groups of well phenotyped individuals so I'll put that and our last speaker for this panel is also from Johns Hopkins Andy Feinberg who will be talking about epigenetics I was trying to think about how to explain epigenetics in 5 to 7 minutes so I thought well I'll do something like pompous thing called a gedanken experiment which is like an old fashioned word it means thought experiment and much better ones than mine like Schrodinger's cat which is alive and dead at the same time because you can't exactly predict when radioactive decay takes place and Maxwell's demon which is a famous thing about this little monster who opens a little door and the particle could go in and out to explain thermodynamics so I'm going to give you my gedanken experiment and can I just say please it's meant to be light hearted do not take this on face value so what makes this different? okay this is the great question that's what Lisa raised and I think it was Arab Vinda maybe it was making the point about do we even call ourselves human geneticists decades ago that's what we really want to know what's responsible for being a typical variation okay so here's my example the United States Congress what makes them different every time I open up the paper I never read about how they're the same this thing and they want that thing and you know they don't agree on this and that and there's the House and the Senate and they don't agree either so how do you explain that from a genetic point of view? well each one of them the 535 people has 3 billion base pairs of DNA and there are about 3 million differences in DNA sequence so I mean I guess that's one explanation that might explain the differences that's about a tenth of a percent alright but what about this although those two particularly don't really look that different than that particular picture if you go to the Washington Zoo you're not going to see a conference like this you know if you go to the primate house and there's 3 billion base pairs of DNA in them also but there are about 30 million differences in DNA sequence and although the thing about the Congress was sort of you know a joke kind of but this is serious I mean that's the reason somewhere in those 30 million explains those huge differences between them and us so now what if you were to do the following thing if you were to take the person sitting next to you and autopsy them which I wouldn't recommend but if you did you would see all those different organs and they look very different from each other the brain, the liver, the eye the heart the lungs, the colon and the pancreas and all that and I would argue that tissue development which is what this is is far more different than the differences among the Congress members or the differences between the human and the chimpanzee and they're just profoundly different one tissue from another so how do we explain that well there's 3 billion base pairs of DNA also and there are zero differences in DNA sequence that determine this as far as we know you know the jury's still a little bit out on that but it doesn't seem so and the things that we do know about like telomerase length changes and immunoglobulin gene rearrangement actually aren't they're important but they're not necessarily what's causing these changes so we don't think that there's anything at the level of sequence that's responsible and so that's what epigenetics is about it's information that a cell remembers like the liver knows it's a liver other than the DNA sequence and it controls tissue specific developmental programs and I think that's a fundamentally important thing and you know a lot of what we do goes back to the early days of Darwinism and and Darwin in particular I've gone back and read Origin of Species a couple times for another reason over the last year and he actually at the end talks about how he wish he understood developmental biology because he thinks there's something very important about heredity and developmental biology but since we don't know what that is he can't address it and he feels like very frustrated at the feeling reading it that he wishes he could live another 50 years because he doesn't want to let that go so what are examples of epigenetic change so besides the DNA sequence that you know that you know about the AGT and C there's this extremely beautifully drawn here carbon with three hydrogens that's called DNA methylation and you hear a lot about that we work a lot on that and it's a chemical change in DNA and there's a mechanism for remembering what those changes are when a cell divides now so I'm very interested personally in disease related variation and many common diseases seem to involve developmental defects in the same organs I told you about like cancer involves developmental changes in the stomach for example stomach cancer where those cells aren't differentiating normally there are studies on schizophrenia in the brain suggesting that there might be developmental differences either by imaging or in terms of all pathways for some of the genes that have been found kidney disease related to diabetes generally appears to involve metabolic developmental changes a number of organ systems you can make a developmental argument about and you have to also add to this the role of the environment so let's say you have the perfect genome and you get the perfect man here David and you know the Michelangelo statue to give them like a double super whopper burger or something and you get this and it's you know there are problems developmental problems that occur based upon our environment that would not necessarily be affecting the genome but we know that diet and other environmental exposures can affect your epigenome so I'm just going to tell you very briefly our lab as well as many other labs are now very interested in pursuing this we've been using a technique we're starting to use sequencing for this but you heard about the cost so if we want to look at thousands of samples we're still using methods that are cheaper at the moment involving arrays to look at slightly under a half of all the sites that can get methylated across the genome and just a couple of the things that we've observed and I apologize it's a data slide but this is like a measure of DNA methylation for this axis and here's just a region of the genome about 2,000 building blocks of DNA here and this is just a study that we published last year that shows a very strange thing that there's an area where if you did that autopsy and we did actually on people who'd given permission before they died you know to be studied in that way and looked at and found a lot of regions many thousands of them where there's more methylation in some tissues and less methylation in another in other ones or vice versa but but when we look at a cancer in particular colon cancer what's happening is that the pattern of DNA methylation is changing and it's resembling the wrong tissue and that in general we're finding that many cancers have a methylation change that's actually a combination of their normal methylation pattern but also the methylation patterns that are developmentally wrong that involve other tissues that would fit that picture that I was telling you a little bit ago about how disease may be related to developmental defects so in a way cancers are thinking they're the wrong tissue and that would be an epigenetic change related to development and related to disease and then this is just some unpublished data that we're hoping to get out soon but here's another one of those little pictures where there's more methylation and there's low methylation on a group of individuals that were followed from a cohort in collaboration with Dr. Goodneson at the Icelandic Heart Foundation and with Danny Fallon who's an epidemiologist at Hopkins and what we're seeing is there are regions where there's a great deal of variation in the normal population but they seem to segregate out normal from obese individuals so it may be that Michelangelo statue problem may be manifested to some degree in epigenetic differences and they may be a little bit greater as people are older in the same individual and the gene I'm showing in particular here happens to be a gene for development so this is very new it's not even published yet so this sort of thing can lead to new insights and as I say there are many laboratories so easy to talk about your own but I mean for example there's an epigenetic roadmap initiative you can see on the NIH website and there must be 50 labs that are doing experiments like this with NIH support at least so that's about it, thanks Okay so this panel is now open for questions but that is one short microphone isn't it but that's alright we can fix that there you go Thank you so much for your excellent presentations I'm Catherine Talmidge with the American Dietetics Association and Personalized Nutrition Where does my... You must have liked that hamburger analogy I bet The whole idea of how genes respond to the environment and to diet and to exercise is fascinating to me I wrote a major piece in the Washington Post a few years ago about how exercise turns on certain genes which when they're turned on clear fat and sugar from the bloodstream quickly and efficiently but you have to have to exercise every 24 hours for it to work the glute four and the LPL gene anyway so that's how exercise is vitamin D and I understand that vitamin D is needed in order to help DNA work in absolutely every cell can you explain that and for instance I apparently DNA is necessary for the renin it affects hypertension it affects the renin hormone it affects insulin it affects every cell in our body in the DNA in each cell and how it works Let me try I don't know anything specifically about vitamin D and the effects that you're talking about and it's I'm not saying it's not correct and science is so broad it's difficult to keep up but I think the question that you're bringing up is in fact one area that's increasingly getting attention which is for a long period of time you've gone through this not very useful debate in talking about nature versus nature it's much more interesting and sort of thought logically has to be the case how is it that our genes more importantly how is it that our physiology for example your first example respond to environmental factors and we're now all using the word environment in a very, very very, very different way in a very broad way so it's no doubt the case that is not only exercise by the way exercise has been shown in a variety of studies to affect many aspects of insulin regulation for example but there must be many other environmental factors died being one of them I'm sure temperature and other aspects of local environment are important that many such things not only have an immediate proximal effect such as what you're talking about but many are known to have very long term effects there have been some beginning studies done on the survivors of the famine in Holland for example in World War II that shows that for some genetic markers that they have been imprints left I think in this case in their offspring which is many, many decades ago so there's no doubt that these exist and it's only recently we now have the tools to go and look at various kinds of protein markers metabolomic markers and other markers together with a DNA sequence to sort of figure out what the interplay between so-called environmental factors are with genomes I don't know whether that answers your specific question I can't tell you anything about how vitamin D affects replication I think that's what you said other questions so my name is Larry Thompson I'm the communication director at genome so we've been following there has been this debate about dark matter and all that and I have been to talk very eloquently about going down into the deeper into the sequence and looking more at the sequence the sequence variance that there is and Andy is talking about very much the kind of modifications on top of the sequence I was wondering if the panel as a group could give us a sense of and the idea of all this of course has been that you could sum up all these different changes and out of that summing up of the variance control changes you would get the phenotype and I think probably naively at least in my case I thought it was going to be 30 maybe or 40 different things across the genome that then led to diabetes or heart disease or whatever now that we're getting further and further into these studies and we're seeing that it's complicated I guess we should have expected that I was wondering if you can give me what your sense is of how much variation how many changes will have to be observed to be able to sum up to have a statistical prediction of what your risks really are or whatever common disease you're particularly looking at I think we all could probably talk for quite a while about this in fact we did earlier last spring but you know there's going to be probably many things the one thing I think is that scientists don't have a crystal ball and I think that the good ones have a sense of humility about their own work and also about what their guessing power is I mean there are a number of things that we could probably contribute to the perceived gap between what we measure as heritable risk like Arapinu was talking about and then when we look for variants and try and see if we can add them up in account for disease and there are a number of things I mean we're going to need to get sequencing at a more higher resolution to find tiny changes look for rare variants there's a very promising area that has to do with copy number variation there's another idea that I think there's a degree that that we published earlier this year that suggests that there might be heritable variants V-A-R-I-A-N-T-S in your DNA that contribute to phenotypic variants V-A-R-I-A-N-C-E in a stochastic way that was published in PNAS in January and we're pursuing that pretty aggressively to look at some of these populations so that there might be some degree of stochasticism in itself that degree of stochasticism might be controlled genetically but we all have our sort of pet theories about this and I think everything is going to contribute to some degree we just don't know which ones to how much so I think it's important to distinguish sort of two parts to the question that you ask one is it is entirely possible and feasible and probably likely that there are hundreds of genes that have genetic variants that affect a particular phenotype I think the current estimate of the number of genes that control height is probably several hundred that's fine and I think you use the word it's complex well everything in biology may be complex I think physics is very complex but there are many aspects of physics that are well described and theoretically explainable so I think we are at this stage when we often use the word complex we mean that the physiology is complicated and the genetics is complicated but honestly we are much more ignorant than the stuff is complex yet so I have no doubt even if there are 300 things if we understood it much better a much better position to understand we will have a much more satisfying view of what's going on the other part to remember is you know there could be hundreds but despite there being hundreds of things there may be tens of things that are rate limiting that are manipulable manipulable experimentally as well as therapeutically just because there are 300 sites of variation in me that distinguish me from somebody else as to the risk of hypertension doesn't imply that there are 300 sites at which you know my physiology needs to be tweaked for my blood pressure to be normal we know that's not the case because our third of people do very well exceedingly well on relatively a single blood pressure medication and the reason why that's important is if you look at even molecular biology replicating DNA is extremely complicated but scientists have recapitulated replication in a test tube now for at least I don't know how many decades right if we couldn't do it we couldn't figure out that it sometimes happens in reverse so I think we will understand enough I'm quite confident and we already know enough that we need to get to the genes we need to understand all these other mechanisms that's coming up like from the comparative studies as well as the epigenetic studies that we will be able to manipulate and understand things first experimentally and then physiologically the 300 need not deter us or scare us and I think the idea is to understand more and I think we are my fear is we are finding more things than our capacity to understand them in the same pace Andy how certain are we that we know all the different chemical modifications of DNA or even all the decorations sometimes will be a modification epigenetics we sort of focus on a small number of things do we have any idea what the full universe of DNA decorations might be in the epigenetic world that's a great question and the answer is we don't and in fact the number of modifications of DNA so on paces our understanding of the mechanism for replicating those changes so at the moment the only mechanism really well for copying information non-sequence information is DNA methylation there's an enzyme we understand that extremely well we've known it for 25 years that's not to say that these histone modifications, chemical modifications clearly are heritable during cell division and there's some very promising models for how that's done and it's true it's just that the basic biochemistry has not been worked out it's hard to believe that in 2010 now known modifications of histones are all independently replicated during cell division some of them must be dependent on other ones so I think the really key thing is to figure out mechanisms of copying the information and then focus on those because they're going to be primary but there might be many more out there we haven't even discovered and that's right and the role of RNA in all this is also something that people are just beginning to understand so by the time we get our head around sequence variation we will recognize that's going to be trivial compared to the other variation that's probably out there or perhaps well in a way although I can just say as an epigeneticist we've profited so much from what's happened in terms of genome sequence I mean it seems that so much of what we do in methylation and chromatin is driven by DNA sequence that as we learn more those other things will be but that's the technology that drives the discovery doesn't mean you completely understand the choreography of it that's correct Lisa do you want to add something I was just asking making a very important point that you asked about integration so when we find out that there's 20 to 200 or 300 variants affecting something like type 2 diabetes it's not that they're all going to act one at a time you can just add them up we're going to need to know how they all interact with each other and how they interact with the environment so if you're doing something like 23 and me or something you don't want to know one at a time you want to know what is my risk it's going to be higher it's going to be lower it's going to be based on the information all the set of variants plus some environmental contributions but I do have to getting to this question of what's missing in terms of explaining the genetic contributions to disease I actually have a very simple answer it may be expensive but it doesn't mean it's not simple is large sample sizes get you a lot of information because large sample sizes if you're looking at these study disease studies by having a lot of people look at you can find a lot of rare variants so there's a lot of people say it's rare variants but I think by having large sample sizes that you'll be able to detect small effects so certainly some of these contributions are going to be from perfectly common variants that don't have large effects as small effects and if you have a large sample size you'll be able to detect small effects and the other thing if you have a large sample size you'll be able to take interactions of genes with each other and with environmental factors so Eric as a funding agency here this is something that's actually very important by having very large sample sizes in these studies you'll be able to detect a lot of this quote missing heritability there's sort of a lot of possible explanations so it'll be helped by that and just one other point related to this I think this is a really a really good point but I have had this discussion before that a really terrific model system to study human diseases as human beings and we've been around for a long time for so many generations we don't get to control the patterns of mating and all the rest of that but I think it's very well worth our exploring in great detail the nature of phenotypic heterogeneity in large sample populations and how that might be related to exposures and so forth I think we need to do it to get the answers I agree with you I want to thank these three panelists for their contributions we're precisely 34 minutes behind we're going to try to make some of this up we're going to take a 10 minute break now we're not inhumane so we will take a break but we will try to reconvene here