 All right. So with that, it's my pleasure to introduce to you today our speaker and one of my fellow course organizers, Dr. Eric Green. Eric, as many of you know, is the Director of the National Human Genome Research Institute here at the NIH prior to his appointment into the directorship. He was the scientific director of our intramural program dating back to 2002, and I had the pleasure of serving in the role of his deputy during that time. He was also the founding director of the NIH Intramural Sequencing Center, State of the Art DNA Sequencing Facility that has played a really critical role in the advancement of genomic science, particularly in the area of comparative genomics, and you're going to be hearing a lot about that throughout this particular series. During the almost two decades that Eric spent directing his own research program, he and his group made major contributions towards our understanding of the human genome, having had significant involvement in the sequencing of the human genome since the very beginning of the human genome project, and then having developed technologies and strategies for the large-scale analysis of vertebrate genomes, providing us with interesting and seminal insights about genome structure, function, and evolution. Because of his work in the field of genomics, Eric's received numerous prestigious awards in recognition and has been inducted into both the American Society for Clinical Investigation and the Association of American Physicians. Today Dr. Green will be presenting his perspective on the current genomic landscape, thereby setting the stage for many of the topics we're going to talk about throughout this series. As those of you who've had the opportunity to hear Eric lecture in the past, you know he's an absolutely wonderful speaker, and I'm sure you'll enjoy today's talk very much. So with that, please join me in welcoming my longtime colleague and today's lecturer, Dr. Eric Green. Thank you, Andy, for your kind words, and let me add my own welcome to all of you for the kicking off of this 12th series. Andy and I started this, just a brainstorming idea back in 1995, can't believe how long that's been, but Tira and Andy are being kind to give me any credit for organizing the current series, believing they did all the work for this, but I have been involved in lecturing in this series now all 12 times, and it's interesting that every time I go to prepare my new lecture, I would just say I was only 18 or 24 months ago, but they gave the lecture not much more to add, and it always strikes me as truly remarkable how much new material I need to add, although there's a lot of foundational stuff to always include. There's always so many new developments when you go to paint the landscape of genomics with such a fast moving field. My assignment was to give sort of the genomics landscape with a current view of it. I will admit it's going to have a heavy emphasis on human, so I'm really going to mostly emphasize painting the human genomics landscape. Before we get into that, I should remind you, and you might imagine, I'm an institute director, so I'm financially quite boring, I have no relevant financial relationship with any commercial interests, et cetera, et cetera. So in painting what is a very, very vibrant landscape, what I'm going to try to do now for about the next hour and 10 minutes or something like that is really to cover a number of areas. First I want to set a historical context for genomics. I then want to talk about the major achievements that have taken place since the end of the human genome project, which will allow me to then really describe the human genomics landscape in particular the way I think it looks like now and importantly into the future. The other major goal of my lecture in painting this landscape, which is why they asked me to kick this off, is that I am going to try to place every single future speaker in this series into a broader context within this landscape and I will name every speaker and tell you exactly where I think they fit in and will expand upon what basically I only have time to barely touch. So with that in mind, let's just launch in and if you're going to paint a vibrant landscape, you certainly might imagine that there was a lot, a lot of brilliant work that took place before any of this could ever be crafted and indeed I would say that there's some major foundational milestones and some real iconic figures that really helped facilitate the growth of genetics and genomics as disciplines that we know today. I could probably spend two or three hours talking about this, but just to briefly summarize, you'd have to sort of name Randall as one of the key players in this, really understanding some of the basic rules and properties of inheritance. Clearly, Meischner, when he discovered DNA as a molecule, clearly deserves credit for helping to facilitate the growth of this field and then of course there was Avery and co-workers who did all those weird experiments by taking bacteria and boiling them and not boiling them and injecting them into mice and see, figure out exactly which ones exactly would cause infection and death and therefore figured out that DNA was almost for certain the inherited material that was so important and that then set the stage for what took place in the 1950s with Watson and Crick's discovery of the double helical structure of DNA, shown here is that nature publication arguably the most significant publication of the last century, at least in terms of the biomedical research arena. The double helical structure of DNA really provided the last piece of the puzzle that existed to really make us understand how it was that DNA could be the information molecule carrying this information from one cell to the next and one generation to the next and we understood its structure, we therefore could understand its properties for carrying on as an information molecule. Brilliantly that set up in the 1950s, a set of studies that then carried over to the 1960s to begin to untangle how it was that DNA actually functioned. For example that led in the 1960s to the elucidation of the genetic code, something featured here on this campus by the work of Marshall Nuremberg and really leading to a Nobel Prize for him and others and really provided us the key insights about how it was that DNA encoded information for then making proteins and therefore the genetic code and the lookup table that all of us learned about in intro biology became key to our understanding about the role DNA had played in encoding information for proteins. This later of course led to better and better technologies for studying DNA in the 1980s of course one needs to credit being sort of the molecular biology revolution where DNA cloning came to be, certainly that's when I was a graduate student and a medical student for example and that was the rage, learning how to manipulate DNA in the laboratory, learning how to clone DNA and of course learning how to sequence DNA and improve on that sequencing that was so key for being able to now be able to study DNA and the kind of exquisite detail that we wanted to. This of course laid the foundation for beginning to look at our ability to comprehensively study our own DNA, human DNA as well as all of the DNA of an organism which of course is its genome. And actually one thing I thought was interesting and I've reflected back on this in recent months was I've now been involved in genomics for about 25 years. I started literally about a year after I graduated with an MD-PhD degree and I won and that was 1987 and I wondered why is it that in all of medical school and all of graduate school I never heard the word genomics and it is a very young discipline so I actually did a little bit of digging to figure out when was the first usage at least in the scientific literature of the word genomics and how come I had never heard about it in graduate school or medical school. And a little bit of digging revealed that the origin of the word genomics truly was 1987, the year I actually graduated. In fact what I could find was the first use of it was what the creation of this new journal called Genomics which I'm very familiar with and this was the opening editorial for that journal which among other things they talked about the name and they said for this newly developing discipline of genome mapping sequencing including the analysis of the information we have adopted the term genomics. The new discipline is born from a marriage of molecular and cell biology with classical genetics and is fostered by computational science very similar to what we're going to exactly cover in this series of lectures as part of this course. This was as far as I know the first usage in the scientific literature of the word genomics and I dug a little deeper to find out well somebody still must have come up with a name to create the journal and it turned out that as is often the case Bethesda and NIH played a major role in this and a little bit of digging revealed a historical piece written in the journal the National Cancer Institute much later but was looking back and basically talked about a workshop that was held sponsored by NIH here in Bethesda and a guy named Thomas Roderick at the Jackson Labs was participant in the workshop and over beer in the evening that night after the workshop and sort of all the participants went out and had dinner and beer and so forth they sort of he was attributed to coming up with this name we need a name for this new emerging area of comprehensively studying the DNA of organisms and the word genomics was born. So interesting enough I think it does go back to about 1987 ish when genomics first came to pass and the reason it was important to come up with a name for this of course was that the late 1980s were a time when scientific leaders were strategizing about the notion of comprehensively analyzing mapping and then sequencing the human genome and of course that led rise to this monumental project called the human genome project which was formulated in the late 1980s about the time genomics as a word was coined and then launched in October in 1990 I actually was fortunate to be a participant of the project from day one as a frontline participant as a postdoctoral fellow and we'll tell you that when the project started we had no idea how we were going to do this we really there was amazing amount of things that had to be figured out on how to map DNA efficiently and eventually had a sequence DNA to scale that would support the sequencing of something as large as the human genome but it was a remarkable international endeavor that galvanized interest of thousands of scientists around the world and the rest is history remarkably hard work dedicated efforts remarkable creativity leading to the completion of the project about 13 years later in April 2003 which now is about 11 years ago and in fact last year we celebrated in various ways including in this amphitheater in a number of ways the ten-year anniversary of completing the human genome project quite successfully well there's been incredible growth since the end of the human genome project in the field of genomics there's just been so many applications of genomics many of which I know you're familiar with or hear about or read about and many of them aren't even directly in our main area biomedical research in fact the areas shown here have all been greatly advanced by the tools and technologies and information associated with genomics and some of these disciplines genomics is having a very major impact this is not what I'm going to talk about it's not sort of the mainstream of the human genomics landscape that I wish to paint what I'm going to paint not surprisingly being at the National Institutes of Health is the application of genomics to health disease and medicine and needless to say this isn't entirely surprising because as soon as the human genome project ended there was significant interest in now using this foundational information about the human genome to start to think about how to improve the practice of medicine using genomic information and whether you saw in the popular press such as the New York Times or the scientific press such as Science Magazine terms like genomic medicine started to come to the fore there were entire series of major journals that were dedicated to articles in genomic medicine entire journals established such as genome medicine and then major textbooks were created that either said genomic medicine or personalized medicine or one of these other words that's largest synonymous with genomic medicine this quickly became a vibrant and important area to think about in terms of applications to clinical medicine so like a drumbeat of interest one might imagine there was significant interest in the field of using genomics in ways to improve human health the term I tend to use I think that NHGRI tends to use is the term genomic medicine again largely synonymous with individualized medicine precision medicine personalized medicine so forth and we take a very narrow definition of this and this is what's shown here we regard genomic medicine as an emerging medical discipline that involves using genomic information about an individual as part of their clinical care for example for diagnostic or therapeutic decision-making and other implications of that clinical use and this very this definition and very much is a key focus of what is going on now in human genomics research both here at NIH and really in many ways around the world the way I like to think of this and I do think about this a lot is that we have a long journey ahead of us and that journey really began with the end of the human genome project that in many ways was the starting line for this very long journey and I think one day we truly will realize genomic medicine and we need to define and then traverse that path and the various steps along the way some are known and some are straightforward some are known and they're really hard and some we simply don't even know yet we go into a journey like this optimistic because we were wildly successful the genome project we've got to believe we're going to be successful at realizing genomic medicine and when we are successful I think we'll have fulfilled the promise of why we said we should do the human genome project in the first place so how do we pursue such a journey how do we define that path how do we make those steps successful ones well I'll tell you that the field of genomics is very much driven by very thoughtful and strategic planning it goes back to what I said earlier about the human genome project when the genome project began we had a goal we had no idea how we really are going to attain that goal and what we did was go through a series of strategic planning processes almost every two to three years to define how to accomplish the goals of the human genome project and that very much has become sort of part of our culture that when we sort of see what needs to be done we have strategic planning process and then have in a very transparent way of way of articulating that vision and so thinking about this journey from the base pairs of the genome project to the bedside of patients or if you prefer the metaphor from helix to health literally the day the genome project ended NACRI published a strategic vision on behalf of the whole field of genomics that articulated sort of the next phase for genomics research having in hand a sequence of the human genome and it was a very good plan that served I think the field and the institute and NIH well but because of accomplishments it turned out it wasn't going to even last a decade and so by 2011 we underwent another major strategic planning process again engaging the field very broadly including international scientists and hundreds of scientists actually in total and then articulated a strategic vision now several years ago that really now described the next phase for human genomics research in particular you could note that it has a major emphasis on actually now applying genomics to clinical medicine and even uses the phrase genomic medicine in the title if you've not seen this strategic plan which i'm going to describe to you in a little bit of detail now i would urge you to jump to this url and freely downloaded or certainly you can get to it by going to nature but a convenient url is right there for you to find that strategic plan interestingly what we heard in 2010 leading up to the strategic plan in 2011 which was very different than what we heard in 2002 leading to the 2003 strategic plan the difference was that now for this plan it was time to be more specific and more sophisticated in describing how it was you are going to actually apply this to clinical medicine and at the end of the day we found it very useful to describe a research agenda for the field that really had all the programs and projects or most of them falling into five major domains of research activities let me introduce you to those five domains because they really become a framework by which lots of nearly everything we're doing falls within the first domain was very familiar to us doing genomics research to understand the structure of genomes how genomes are put together that's really what our origins were in the genome project so this is a very comfortable familiar domain to us of course the next domain uses the knowledge of structure to understand function using genomic research to understand the biology of genomes how they work with knowledge of how genomes work you can start to understand how differences in genomes might play a role in disease so the middle domain is using genomics to understand the biology of disease insights about genomic variation as it pertains to disease and with that knowledge one could imagine then using genomics to advance medical science and so that domain is all about advancing the discipline of medicine of course we were told don't stop there that there's also a responsibility to make sure that you can actually change healthcare for people and delivery of healthcare is pretty complicated thing so it was also important to do research to demonstrate that you can actually improve the effectiveness of healthcare using genomic approaches so these five domains really represent a notice of progression for more basic science oriented research to more clinical research but also a logical progression of how to go from basic knowledge of the genome to actually changing the practice of medicine so let me tell you now a series of steps that have been pursued and in many cases quite successfully in going across these five domains that have happened since the end of the human genome project in particular and the first step really speaks is really to the first couple of domains which is really understanding the function of the human genome sequence going from structure to fundamental functional knowledge about the genome now you may say wait a second why did you need to do that isn't that what the human genome project did well let me just remind you that the human genome project really was about first mapping the human genome getting organized and then sequencing reading out the three billion letters of the human genome sequence it was not about interpreting the human genome sequence we always knew that that was going to be something that would have to happen after the human genome project and something that was likely going to be incredibly difficult and indeed it is incredibly difficult so really the genome project you know simply produce this although this is only point 001 percent of what the genome project produced it produced an ordering of the G's A's T's and C's we needed to go in then and spend probably what'll be decades interpreting what those letters actually mean when the human genome project ended we have to be candid here our tools for actually understanding the meaning of the sequence was actually quite nascent shall we say there were some things we understood well and there were many things we didn't understand well and we still don't understand well what do we understand well well one of the things we understood well is we understood genes because at that point when the genome project ended we knew about introns and exons and we even knew about alternate splicing and we knew how things got shuffled around those exons that produced different kinds of transcripts and importantly we also knew the genetic code so looking at DNA sequence especially computationally and trying to figure out where the genes well it wasn't simple it was at least approachable and so that was the first thing we were able to do quite effectively was to start to develop catalogs of all the genes albeit with lots of complexities of alternate transcripts and and so on and so forth but fundamentally this is where we could get the biggest head start on and so off we went and many investigators worked very hard to develop gene catalogs and highlighting those parts of the human genome that directly coded for proteins protein coding genes now of course there's a lot more complexity associated with what is now known to be a catalog of about 20,000 genes and a lot of that complexity has to do with gene expression and all the regulation that goes on and figuring out where went and how much those 20,000 genes are turned on and we're going to bring you an expert in Paul Meltzer who in May will come here and talk about gene expression in particular and methods associated and strategy associations for studying gene expression but of course we need to move beyond genes and we had a suspicion there was a lot of complexity associated with other parts of the genome that didn't directly code for protein but we really didn't have tools for understanding how to get at those we didn't have a genetic code we looked at our wonderful iconic figures and see all the things they had taught us and realized that wow we even needed a different consultant to help us on the non-coding parts of the genome and how they function and ironically the best consultant available to us at the end of the genome project was someone that predated all these individuals because the best lessons of what we needed to do to study the genome were provided to us by this individual Charles Darwin because in fact the next major phase of interpreting the human genome sequence really required a fundamental understanding of the properties of evolution that Darwin had taught us and among the many things that was attributed to Darwin he pointed out how it's not the strongest of the species that survives nor the most intelligent that survives it's the one most adaptable to change and fundamentally over the course of millions of years of evolutionary time genomes have been tinkered with through evolutionary forces and some things when they get tinkered with work and some don't but some things just don't change at all and if they don't change at all it's probably because they confer important biological properties that cannot be messed with and in doing so those might give us clues by studying other genomes about what and what can and cannot be tolerated to be changed it is the reason why a very prominent genome researcher Eric Lander made the comment a number of years ago that for the past past three and a half billion years evolution has been taking notes and in fact all those notes were scripted in the genome sequences of other species and so the idea came to the fore of well we could understand the human genome sequence better if we could simply compare it to other genomes and figure out what has and has not changed read those notebooks of evolution which will be able to teach us key lessons and of course humans are just a small little twig on this very complicated phylogenetic tree of vertebrates and it became very obvious we could sequence and we could actually sequence better than we could readily understand the non-coding parts of the human genome and so that is the reason why even before the genome project ended we were off sequencing the mouse genome and the rat genome because they were laboratory models but also companion animals like dog and our closest relative the chimpanzee but we also knew almost from some of the lessons Darwin had taught us that we needed to reach wider across the phylogenetic tree and so many more species got added after the human genome project ended and in total there were about 30 really good quality genomes at least a draft form that were generated enough to do the kinds of analyses that were required and the idea was feed and by the way take critters off of representative branches of the phylogenetic tree to get as much statistical power that you're surveying widely across all vertebrates all mammals in particular and feed all that data into a computer and simply start looking and ask the question when you line up all those sequences what parts are the most conserved relative to the human genome because that's almost for certain going to be the parts of the human genome that are functionally important and that has given us tremendous insights and taking a very large body of work let me just summarize it in a few slides just by the numbers we now know that about five percent or so of the human genome sequence is constrained basically across all mammals and they're sort of for if it's that heavily conserved or constrained it's almost for certain going to serve some functional role so by five or so percent now that's about five percent of three billion let it's about 150 million bases are found in the same in the same position of virtually all mammals it's probably a lower bound for the amount that's actually functional in fact we know it's a lower bound but it certainly is the most important initial five percent to characterize and understand but only a small fraction only about a third of that actually directly encodes for proteins for protein coding genes of that five percent only a third or so directly codes for proteins now this corresponds I told you to about 20,000 genes and of course lots of things happen with our genes it gets much more complicated alternate transcripts post translational modifications so forth clearly we make as a as a species many many more than that many proteins more than 22,000 proteins our complexity comes in beyond our gene number but that's only about a third of that most conserved five percent what's the other three and a half percent of our genome that is conserved to the same degree as genes what's it doing and of course it's functioning in biological ways other than directly coding for proteins now we have some insights what we what that's likely is it looks like we just don't have complete insights yet among the non-coding functional sequences there's clearly a lot that is dedicated to gene regulation and all the complexities of of enhancers and insulators and silencers and promoters and so on and so forth those are all highly or many are highly conserved and these are all functioning in ways other than directly coding for proteins but we also know there's sequences involved in packaging up our DNA and chromosomes and sequences involved in making sure that chromosomes segregate properly and sequences that play a vital role in accurately replicating our chromosomes oh and by the way then there's that whole world of non-coding RNAs that is just exploding in biological complexity and of course those don't directly code for proteins there within that three and a half percent that represents non-coding functional sequences and finally we should acknowledge there's stuff we just don't know it's functioning in ways that we haven't even figured out yet certainly is not described in textbooks yet we need to find those sequences characterize them and eventually catalog them comprehensively so there is this additional three and a half percent of the most highly conserved almost certainly functional sequences that don't directly code for protein and as I alluded to they're regulating genes they're helping chromosomes function and then there are some undiscovered functional elements as well so that's one challenge understanding the functional elements coding and non-coding within the genome but that's not the only story of course because our genome is even more complicated than that there's a whole other language out there that's not the order of the G's, A's, T's and C's but rather it's the way DNA gets decorated by methyl groups and gets associated with histone proteins and this whole other code of epigenomics which increasingly we are learning plays a major role in genome function when Laura Elnitsky comes here in March she's going to talk about each of these last two topics epigenomics and epigenetics but also all these complexities of gene regulation all very relevant in terms of the non-coding functional sequences in the genome and the reason why epigenomics has become extremely hot lately is because we now have methods to read out that code that these new methods which I'll get to shortly about sequencing DNA can be used to read out marks on our DNA and epigenomic modifications of our DNA and that technological advance has greatly accelerated the pace of our understanding that epigenomic catalogue as well we recognize that in order to get a comprehensive view of all of these coding regions and non-coding regions that are function important it was important to have very large effort dedicated to cataloging those elements and through a series of projects in code it stands for encyclopedia of DNA elements which is focused on human genome and mouse genome and also mod encodes from model organism encode concentrating on Drosophila and nematode large consortiums have been working on those in particular the encode consortium continues to work quite aggressively at understanding the human genome sequence and developing rich encyclopedia type information about all of the functional elements and all the epigenomic modifications associated with genome function and increasingly and of course all this data is made publicly available and I hope all of you take advantage of this you can go to appropriate browsers and you'll learn about some of these and bring up remarkably rich amounts of data such as shown here for two regions of the human genome almost overwhelming amounts of data computational data experimental data that tells you all sorts of things about conserved sequences expressed regions of the genome regions that are bound by transcription factors or other proteins regions that have open chromatin gene models so on and so forth I like to think of this almost like a GPS if you will in your car we have the sort of the fundamental sequence as the roads but then the GPS annotates that with other information like where is the closest gas station and where is the closest place to eat these kind of views tell you where is the closest gene where is the closest transcription factor binding regions so on and so forth and increasingly we're trying to develop tools to allow you to navigate this massive amount of data to come up with the clearest interpretation of what are the important functional regions in a segment of the genome that you happen to be studying but like anything else we recognize that if it gets more complicated than that because it's not just DNA as a linear molecule and it's G's, A's, T's and C's and it's various epigenomic marks that is conferring function but that there is a three-dimensional life to DNA in chromosomes in the nucleus of cells and is now being recognized increasingly that the genome has three-dimensional properties that confer function in different regions of the genome on different chromosomes actually get together sometime and transmit information in ways that we need to understand and better and better methods are coming now available to be able to detect and eventually understand what those interactions are so the bottom line is this is a whole new frontier of genome function that we need to understand and we're trying to study so to summarize this first step of understanding all of the functional parts of the human genome I sometimes joke especially with the younger generation who love spark notes apparently that this is sort of where we are right now we sort of have a spark notes view of the human genome you know maybe it gets you ready for the test tomorrow but it's not going to get you ready for the future of biomedical research we will be interpreting and cataloging and hammering away on the human genome sequence for decades to come it's a lot of complexity in those three billion letters that said in the past 11 years we've made remarkable progress in getting sort of the first generation catalogs up to speed and the kinds that are actually can be quite helpful before I move on to the sort of the next phase of research accomplishments and route to genomic medicine just as an aside the other thing that comparative genomics as I've described to has allowed us to do in addition to understanding the function of the human genome sequence this is also teaching us a tremendous amount about human evolution both with respect to our closest primate relatives but also with respect to our now no longer here the closest relative such as Neanderthal and that's a whole topic in and of itself that is way cool especially those you interested in evolution I'll put in a quick plug because it'll be very special to have a real world expert here on this campus as part of a lecture series that we're putting on associated with Sainel's Svante Pavo who is a world leader in understanding sequencing things like Neanderthal genomes and so on and so forth we'll be here in March you can see a little advertisement there over at a different auditorium here in building 10 and I strongly urge you to try to get to this talk if you're interested in what he is learning by doing some very fancy sequencing of specimens of Neanderthal and other early human ancestors so that's an aside let's get back to the topic we want to get you closer to genomic medicine well the first step of understanding how the human genome sequence works teaches us something about how a generic human genome might operate but of course what we really want to understand is how each of our genomes work and in particular we want to understand eventually how our patients genomes operate and to understand that we need to understand how they're different so the second major area of accomplishment since the end of the human genome project has been in understanding human genomic variation this is again mostly research in these first two domains among this five domain progression now what is known well what is known at a very basic level was even before the genome project ended is that each of us has a difference in our sequence about one out of a thousand bases compared to the person sitting next to you for example so these variants which I show sprinkled across a given stretch of DNA are fairly frequent about one in a thousand and but it turns out that the great great great majority of those variants really are inconsequential in terms of any sort of phenotypic effect but a subset are consequential some of them are not good variants they may confer a risk for getting a disease or may overtly cause a disease and others might be beneficial they may make you resistant or less vulnerable to getting a disease or maybe they would make you a good candidate for getting using a particular type of medication but of course we want to understand which is which which ones we can forget about which ones are not so good which ones are good and so forth and we'd like to have a catalog and you know us genomicists when we want a catalog we put together big science efforts and off we go and indeed it was recognized that it would be incredibly valuable to have catalogs of at least the most common variants that exist in the human population turns out that it's not that all your variants are private in fact most of the variants you have are shared with lots of other people wouldn't it be nice to have that information have scientists be able to analyze those figure out which ones are medically relevant so off we went first thing off the gates was a consortium called the SNP consortium single nucleotide polymorphism consortium to start cataloging some of the more common single nucleotide polymorphisms that led to a new international effort consortium called the international HAPMAP project which aims to not only develop information about the most common variants but also to start to develop information about what's known as the haplotype structure of the human genome because it turns out that our variants don't just sort of go from one generation to the next at random but rather there's neighborhood blocks on each of our chromosomes that each contain a number of variants and those blocks tend to be inherited as one entire block from generation to generation and that had some really important implications that future speakers will be we'll be talking about and this was a very successful consortium and put out several major publications in nature but then when new technologies became available it was very clear we needed to even be more aggressive about cataloging the most common genomic variants across human populations and that gave rise to the latest project called the thousand genomes project another international effort which drew on a collection of DNA samples from now not just a thousand people we were overachievers we said it was a thousand genomes but now it's over 2,500 genomes and they were collected from 26 populations to collect geographic diversity across the world and make those DNA samples widely available and making all the information about these variants also widely available on the internet and thousand genomes project came out with its pilot effort publication in 2010 and a couple years ago now reported on the first thousand and later this year we'll be publishing a major paper reporting on the full 2,500 samples and what do we have well we now have data for over 90 million genomic variants that we know exist and lots of information about what populations they exist and at what frequencies and so forth when Lynn Geordius here in April he'll be talking about lots of things I'm sure he'll talk about data from thousand genomes he'll be talking about population genomics because population genomics has been greatly advanced through efforts like the SNP consortium the HapMap project and thousand genomes so you'll hear lots about genomic variation when Lynn Geordius gives his presentation the other thing a thousand genomes project has done having sequence now thousands of genomes is it starts to give us insights about what is a typical person's genome look like not some hypothetical reference genome sequence what is a person's genome looks like so just for fun I'll just show you what some of these numbers are starting to look like what is your genome look like what is your patient's genome look like and just in terms of differences so each of our genomes is about six billion nucleotides right you got three billion from mom you got three billion from dad about six billion in total and it turns out and if you did the arithmetic that's what you come out with that most people have about three to five million single nucleotide variants where a base at a given position is different compared to let's say the person sitting next to you now as I alluded to the great great great great great great majority of those variants I would sequence any one of your genomes the great majority we've already seen they're in the database yet known by one of these big cataloging projects but a small subset about a hundred and fifty thousand on average are not in databases so every new human genome that we sequence we learn about a hundred and fifty thousand new variants that haven't been seen before probably because they're relatively rare maybe just in your family or just maybe in your little extended family but fundamentally it's sufficiently low we haven't seen it yet and it actually turns out that out of your hundred and fifty thousand variants that are unique have we never seen before 60 of them are truly unique to you are not in either parent so in the process of creating you and replicating the DNA of each parent there were 60 oopsies there were replication errors that got introduced and those are new and some of those probably the great majority of them have no phenotypic consequence but some of them small subset probably do now what is the consequence of these variants on average for a given person well as you might imagine knowing if a variant is relevant you have to have some knowledge about the sequence and what the functional elements are around that statement and so forth and that's really hard for non-coding DNA but at least for coding DNA we know the rules because Marshall Nuremberg helped teach them to us and so as a result we can at least say on average how many times when you have all these variants you have broken genes that would seem to be deleterious and the answer is that each of us harbors about a hundred variants that disrupt break a gene by some criteria so about a hundred broken genes now the great majority of those genes we have two copies of so we have one broken one and one fine one but it does turn out that each of us carries about 20 genes where both copies are broken so one way to think about it is out of your 20,000 genes about 20 of them are broken both copies that you have and some of that might have biological consequences and explain some of the differences among us and some of them might be inconsequential because of the redundancy that exists in some gene families in particular so that's what we now know at least as of today in terms of human genomic variation and of course that nicely sets up the third domain that I introduced you to using that knowledge about human genomic variation to begin to study the basis for human disease and of course that is square center within this five domain progression now to understand and appreciate the what's happened over the last 10 and 11 years it's important to recognize some fundamentals about genetic disease and the underlying genomic architecture keep in mind that every disease has a genomic influence if not an overt genomic cause there's practically not a disease you can name that I couldn't call it with some evidence of having a genetic or genomic influence but disease are very different depending upon whether they are rare or common and that's the dichotomy so on the one hand you have rare diseases diseases like sickle cell anemia cystic fibrosis and Huntington's disease that are rare in the population but turn out to be genetically simple and genomically simple because it's really the single gene that is broken that is the dominant risk for getting the disease yes there might be other variants in the genome that might influence the severity of disease but fundamentally and there might even be some environmental contribution but really you're looking for one broken gene as the cause of the disease that's why they're called monogenic disorders or Mendelian disorders named after the famous geneticist Gregory Mendel that I introduced you to earlier now these are devastating disorders and they certainly affect patients and families in very adverse ways but they don't represent the bulk of healthcare burden worldwide the bulk of healthcare burden worldwide are common diseases because these are diseases that fill hospitals and clinics these are diseases like hypertension and diabetes and cardiovascular disease and many forms of mental illness and so on and so forth and they're very common in the population but unfortunately they're incredibly complicated because it's not a single gene or a single variant but rather it is a series of variants that in aggregate confer risk for the disease with what is often a greater contribution of the environment that's why they're known as genetically complex or multigenic or non-Mendelian disorders now let me emphasize what I'm going to talk about because it's what I do is I'm going to talk about the genetic and genomic contributions of disease but do keep in mind that for especially complex diseases environment plays a very important role here but there's a one of the couple reasons why I'm not going to talk about environment side of this equation one of which is not what I do but the other part of it is keep in mind how the technological advances that have happened in the last 11 years around studying genomic variation have been remarkable and I think environmental monitoring technologies really getting ability to monitor our environment they'll come but the technologies just aren't nearly as powerful yet and as a result the reason you hear sometimes when we talk about this people think oh you think genetics is everything and genomics it's not it's just that it's what we're able to learn in the last 10 years our technologies have given us powerful new insights about this side of the pie chart but when we get better technologies for environmental monitoring we'll have to fill in our knowledge on this other part of the pie chart but what's happened in rare diseases and common diseases since the end of the genome project let's start with rare diseases and it's been pretty impressive shown here is a cumulative graph for disorders for which we figured out the genomic basis we found the defective gene and let me put it in context the day the genome project ended there were 61 disorders for which we knew the broken gene 61 and now you can hardly argue with the fact that as soon as some of the earliest mapping and sequencing data came out from the genome project and beyond the pace of our knowledge about the genomic basis of rare diseases grew steadily and continues to grow to the present time to give you another perspective we now know the genomic basis for about 5100 disorders and there was just 61 we knew when the genome project began to now it's 5100 that number grows every week but that is the glass half full there is a glass half empty side of it because there's another 2,000 or 1,600 or so disorders that we know is or almost for certain believe a single gene's involved and don't yet know the genomic basis and another 2,000 on top of that or 1,700 that we think a single gene's involved and we haven't figured it out so this is a big challenge is to fill in this pie chart and I will tell you that when Dave Valley is here a world expert on rare diseases he will be given a lecture about this and I am sure he will talk about these numbers and what is being done to fill in this pie chart and we'll come back to this pie chart later in my talk briefly when I take you into the future that's pretty impressive though rare diseases what about common diseases those are complicated would we ever be able to get enough statistical power to tease out the different variants that work together to confer risk for some of these very vaccine disorders like hypertension and diabetes and asthma and so forth and Alzheimer's disease and it really wasn't known and in some ways we still don't know how far we'll be able to go with this but a strategy emerged as we got better and better catalogs of the more common variants that exist in the human population to start to on a very large scale take collections of individuals with and without a disease and study which of these known variants they happen to have those with the disease those without the disease and start to scan across the genome in a genome-wide way and look for statistical associations with having a variant of a particular flavor and getting the disease and this resulted in this idea of a genome-wide association study and genome-wide association studies were argued about whether they would work or not and would you be able to detect specific regions of the genome harboring variants that confer risk for a disease and there was there was optimism and by the way Karen Mulkey when she is here will describe this in much more sophisticated detail than I've just done now because of course this is a this is a lecture in and of itself and a rapidly emerging area that it actually gets pretty complicated but I know Karen will be able to talk you through this because of the kinds of knowledge that she has and familiarity of a dedicated career to untangling this there was some optimism that came when the first publication of a successful genome-wide association study using some of the earliest HAPMAP data was published involves some NIH researchers in fact here in 2005 and that was the first paper and as Karen will tell you fast forward to the present time and if you simply put a little circle or a little lollipop around every single region of the human genome that has been found by a genome-wide association study to be statistically associated with getting a disease a complex disease this is the latest graphic which you can just see is just littered with these little lollipops on each of the human chromosomes and we're in 2005 was the first paper describing this today over 1800 publications have reported successful genome-wide association studies in aggregate about 3900 associations for different regions of the genome for about 400 different diseases and traits now that doesn't tell us the cause of the disease it doesn't even always tell you which variant is the one that's implicated in that association of the actual causal variant but it gives us lots of regions of the human genome to now interrogate in greater detail so those are the studies that will go on and Karen will tell you more about it that's the glass half full on genome-wide association studies there is a glass half empty part that is sobering but important to recognize and maybe motivating to some of you is the other thing we've learned as we've dug deeper in the last 11 years and made advances in rare diseases and then common diseases is we've now learned that it's seemingly for rare diseases the great great great majority of the variants the mutations that cause rare diseases basically break genes they're mutations in coding regions and that's in strict contrast to the genome-wide association studies with more which more times than not point to non-coding portions of the genome as the areas that have variants conferring risk for complex diseases what does that mean? that means for the diseases that fill hospitals and clinics and have the greatest healthcare burden worldwide that they are being caused by variants that fall in the regions of the genome that we don't yet understand well enough so if there's ever a motivating reason to better understand the non-coding parts of the human genome it's because they harbor variants that are going to turn out to be incredibly important for the most common diseases afflicting humankind and so that is a greatly motivating feature of studying the human genome and how it works and also following up these very complicated studies that are going to really still require a lot of hard work to tease out which of the variants actually conferring the risk for these common diseases one of the things that's clearly going to be required to truly tease out common disease architecture with respect to variants are very large studies that involve sequence in the genomes of not just hundreds or even thousands of individuals with or without disease but almost certainly tens of thousands of individuals with or without a given disease of interest and that is why it is so important that the next step is successful and that is to be able to routinely sequence whole genomes routine whole genome sequencing then becomes extremely important initially for these first three regions but as you'll see it's also going to be similarly important for the more distal clinically oriented domains as well well I want to take you back to 2003 with the strategic plan I introduced you to because the day the human genome project ended and we published the strategic plan one of the things we called for and we knew it back then was that we needed technological leaps almost as soon as possible that would seem so far off at the time as to be almost fictional but which if they could be achieved would revolutionize biomedical research and clinical practice and what we said in the strategic plan as an example was the ability to sequence DNA at costs that are lower by four to five orders of magnitude than the current costs allowing the human genome to be sequenced for a thousand dollars or less so this was put into print the day the genome project ended the day we had just finished sequencing the first human genome at a time when we had basically sequenced that genome at a cost of about a billion dollars and we were calling for new technologies that would lop six zeros off of that figure and deliver a genome sequence of humans for a thousand dollars pretty audacious to say but nonetheless it became a battle cry for the community the thousand dollar genome became the slogan and the idea was can we accelerate the pace of technological innovation in DNA sequencing that would allow us to retire the factories that were used for sequencing the human genome here's one of those factories and eventually produce something micro or a nano or whatever something incredibly inexpensive that would allow us to basically get a million fold reduction in the costs of sequencing DNA or more and in fact the rest is history and there's just been remarkable technological innovation shown here are not even all some of the platforms that have come to pass since then and I'm not going to spend much time talking about these next generation sequencing platforms because we have a world leader in this Elaine Martis coming in May to describe this in much greater detail but needless to say these technologies one of these machines can now do in a day or two what thousands of researchers and hundreds of machines took about six or seven years to do during the human genome project and the impact has been seen in terms of the cost reductions with respect to sequencing the human genome shown here is sort of an iconic graph that our institute puts out because we have a lot of data because we monitor the cost of sequencing from our funded sequencing centers and just to sort of walk you through it this is the cost for sequencing the human genome with the y-axis being logarithmic and shown in a white line is Moore's law Moore's law is the law the compute industry that says compute power doubles every 24 months or so with the claims that nobody keeps up with Moore's law except for the computer industry except for genomics we now do better than Moore's law shown in green is the cost for sequencing the human genome going back to the early 2000s and right here our sequencing centers switched over to next generation platforms on the previous slide and you can see they have blown Moore's law out of the water ever since and it's been really remarkable with respect to those cost reductions but also keep in mind that it also affects the speed at which you can sequence a genome so when we sequenced that first human genome as part of the human genome project we were actively sequencing human DNA across the whole world for about six to eight years and it cost about a billion dollars at the end of the day the day the genome project ended it was estimated just back in the envelope that if we went to sequence the second human genome it would still take about three to four months to pull it off and it would still cost about 10 to 50 million dollars today uh today it's a whole different world using these fancy new technologies it takes anywhere from one to three days actually it really does look like it's going to be down to a day or so with some of the platforms that were just released and are coming out and going to be used over the next few months and the cost not quite a thousand dollars the first claim of instruments that are going to be available later this year may be getting it down to a thousand dollars we'll see but the fact is it's well below 10 thousand dollars we've seen that million fold improvement already and we're getting it down to the cost of that is certainly approaching a thousand dollars and I don't worry about this stuff anymore I feel so confident that we'll get down to a thousand dollars it very much is like sitting at the airport just sort of watching planes coming in for a landing I know of new technologies that are coming on board next year the year after maybe a few years after that the current suite of platforms will throw away in about two or three years probably because there'll be new ones that supplant those amongst exciting new technologies you'll hear more and more about in 2014 because the first such instruments are hitting sequencing groups include the nanopore they talk about these fancy devices that allow you to take DNA and read it through these nanopores and one of the instruments which is really hitting laboratories I think this month is a small mini device that's shown here that plugs into the USB port of your laptop and will read out allegedly a human genome sequence in something like a day so they claim I'm not endorsing this instrument I just think it's way cool that it goes in a USB device of a laptop I also think it's way cool that the company promises it works equally well in a PC or a Macintosh laptop so I'm excited to see and these devices are just starting to hit some of the groups and then really this month or next remarkable advances and I could promise you ones you don't even know about are coming that are going to continue to reduce the cost increase the speed of sequencing human genomes it is a situation that I almost really now refer to genome sequencing the sequencing part of it as a commodity I am not sure NIH laboratories are going to be sequencing human genomes in the laboratory whether that's just going to sort of be a send-out as a commodity maybe companies will be better doing this maybe it'll be totally outsourced I don't know this is not what's rate limited anymore we can generate the data incredibly inexpensively and it's going to even get cheaper yet the real bottleneck is no longer in producing human genome sequence or DNA sequence of any kind it is becoming a commodity but that's not to say there's not a bottleneck and the bottleneck is the next step which is to be able to actually analyze that sequence this is the biggest bottleneck now in genomics and you'll hear about this from other speakers as well it's a pervasive problem across all of these domains and it's all our fault we created this we're the ones that say develop new technologies and when you develop these fancy new technology they will spew out lots of data the problem is is when they're so effectively spewing out data they overwhelm us and the truth that matters this is the reality we live in now we live in now where data generation is not the bottleneck it used to be now it's data analysis that's the bottleneck and that bottleneck has a lot of components I mean the truth is these fancy methods for sequencing DNA will produce a human genome sequence in a day or two or three and we could slog through that data and it's even cumbersome but we can slog through it and we can even come up with that list of three to five million variants in any given person but that all of a sudden there's you know you face a lot of hardship and doing this on a large scale really industrializes the kind of scale that's needed for the studies that need to be done now have bottlenecks in terms of just hardware and just dealing with all the amount of data and pushing the data from site to site and enough processors to analyze the kind of amounts of data we're dealing with there's all sorts of issues around software tools because while every one of these platforms is way cool every one of the platforms needs specialized software to deal with its idiosyncrasies of the data that's produced there's trainees listen here there's huge issues around workforce especially the next generation workforce thinking about what is the next generation of biomedical researcher let alone a genome scientist what do they look like and how are we going to get them adequately trained to deal with what is going to be basically heavy lifting of large data sets oh and even when we get to the point of having that list of those three to five million variants trust me we ponder and ponder and ponder for most of those variants we don't know automatically whether a variant is biologically relevant or not every one of these steps represents a bottleneck and it's really has multiple aspects to it it's also not unique to genomics there's other areas of biomedical research that have similar bottlenecks it is the reason why as Andy alluded to a major emphasis of this series of lectures is to sort of work through the bioinformatics computational biology data science side of genomics and that is why you can see that Andy and Tira just among themselves have three lectures to walk you through this because it's such a vital and integral part of what's going on in genomics now so those are the steps I really wanted to highlight as being the accomplishment since the end of the genome project but what about these more clinically oriented areas because I've led you almost to genomic medicine but I haven't given you new diagnostics I haven't given you new therapeutics I haven't given you new preventative measures and then there's probably things that I haven't thought of that you're wondering about or maybe things that none of us have thought about and will have to deal with when we encounter them so with that in mind I wanted to shift gears as I promised you earlier and sort of take you into the future and give you some perspective and this is really you know fast moving and changing and really my predictions about where all this is heading and you'll see some of my predictions also include where we are investing at least at NHGRI and others at NIH are investing in areas I'll tell you more than anything about the future is that the future will reflect the past and what we've learned in the past is that technology drives this stuff more than anything let me just remind you some major advances that you've all seen in different areas of science you know the telescope really advanced astronomy profoundly you know clear the microscope advanced cell biology profoundly you know new methods for imaging really advanced radiology profoundly and these fancy new instruments that are giving us tools to read out genome sequences are really going to be driving us technologically into the future one way another way again with the theme of looking back before we look forward to look about genomic accomplishments we can expect for the next 20 years is think back a little bit on what it's been like over the last 20 years and I've just walked you through some of that but just to summarize and this is sort of the central figure in our strategic plan we found it was very useful from a sort of taking stock point of view to just simply reflect on where have been the genomic accomplishments across these five areas over the last 20 years and they make a prediction about where they're going to be over the next 20 years all hypothetical this is just sitting down with a graphics artist and just representing it to sort of a hypothetical view and doing this as a density plot where each hypothetical accomplishment it gets a blue dot and as they pile up on each other they change colors and eventually become red and do it across different time intervals the genome project as a starter where were the accomplishments and the genome project was the first domain yeah we learned a little bit about how genomes work but fundamentally all the heat was in that first domain but since the end of the genome project to let's say 2010 the center of gravity shifted as we launched efforts like encode and others and we launched efforts to get a knowledge about genomic variation we learned more and more both about the structure of genomes including variation but also about how genomes work and with that yeah yeah we got some insights about disease but basically it was the first two domains that were the center of gravity sat what about what about now to the end of the decade either you're going to see it shift over so that the two most progressive areas of genomics are going to be better and better understanding how the genome works but also in particular new is really advancing understanding about the biology of disease not yet advancing medical science profoundly not yet improving healthcare I think that's coming but I'd be realistic here we're not going to change medical science and medical practice and healthcare delivery overnight or even in a decade it's going to but it's coming and I'm fully confident you will see a progression right word but I think the real heat between now and the end of the decade is going to be in the second and the third domains in particular so I also want to remind you that these I sort of regard the first two domains sort of basic genomics technologies for studying DNA methods for understanding how the genome works understanding human genomic variation and so forth that's not genomic medicine nor is really this middle domain where we're going to see a lot of advances this decade I regard this as discovery discovering that which variants are ones that are conferring risk or overtly causing disease again that's not quite genomic medicine we'll talk about genomic medicine in a minute it's discovery research and what I really want to tell you about now is how are we going to create these significant advances that I'm predicting between now and the end of the decade at discovering the genomic basis of disease well let me give you some flavors it's going to involve these fancy new sequencing methods and they are being applied on a very large scale I can tell you for example we have a major program and David Balli when he's here is one of our grantees a series of new centers have been created whose goal it is is to fill in that pie chart industrialize the process of taking these remaining disorders and using these rarely remarkable sequencing methods to get at the underlying defective gene we predict we'll be able to do a significant amount of filling in this pie chart between now and the end of the decade the program is called the Centers for Mendelian Genomics again once again named after Gregory Mendel and if you're interested there's a paper that describes that center and he even describes that program and how you can interact with those centers complex diseases a big part of it figuring out more information about those lollipop regions figuring out the underlying variants that sit in those regions figuring out which ones that weren't conferring risk for disease for that we've put our biggest centers on the ones that have been involved in genomics on a large scale since the genome project and you are seeing them tackling disorders like Alzheimer's disease diabetes autism schizophrenia and so forth in addition those same groups are attacking this cancer of course because cancer is a disease of the genome in fact it's probably the most exciting advances in clinical genomics relate to cancer recognizing that as tumors develop they are picking up various mutations in their genome and of course the same methods that we use to sequence regular old DNA can be used to sequence tumor DNA and through efforts such as the cancer genome atlas which is the flagship effort in the United States for sequencing tumor genomes we are learning a tremendous amount cancer genome atlas has picked a couple dozen types of cancer it's now sequencing literally hundreds and hundreds of specimens and cataloging all the changes that take place in those cancers making all that available all that data available it is one example of cancer genomics this is going on in other countries as well there's entire international consortiums that are set up to do this and it's really greatly advancing cancer our understanding of cancer I am sure Elaine Mardis will likely mention this again in addition to being a technology expert she's heavily involved in cancer genomics as an aside but to contextualize a future speaker the same power of sequencing human DNA can also be applied to sequencing microbes and partially not directly related to understanding human genomic disease but understanding other aspects of human disease efforts such as the human microbiome project the microbiome being all the organisms in various communities in various locations this is a major NIH effort but you're going to hear from Julie Segre an intramural investigator here in June about how these same powerful technologies of DNA sequencing are being focused on understanding infections disease and understanding microbiome this isn't as a side but again it's very much in a discovery zone of understanding aspects of human disease so that's discovery and that's what we can expect rare diseases common diseases diseases like cancer but what about genomic medicine genomic medicine is really these more distal domains and as you can see it's not so much about what's going to happen this decade except we want to lay a foundation for what's going to happen in subsequent decades and we have an entire lecture that's dedicated to genomic medicine because it really is now beginning to come into focus and that's going to be given by Bruce Korf we'll be visiting here in May and we'll give a lecture exclusively on genomic medicine but what I thought I would do just for about five or ten minutes is to tell you my view and I'm sure Bruce will talk about some of this my view of what some of the hottest areas are taking place in genomic medicine and in particular things we're doing to build that foundation this decade to help bring in the advances I predict that'll start happening next decade and they're really six hot areas I just want to briefly introduce you to and I'm sure Bruce and others might touch on this I just mentioned one of them cancer genomics this is probably the lowest hanging fruit in terms of clinical applications of genomics and it really is one of these things that we've gotten very used to the idea of having radiographic imaging as part of a diagnostic armamentarium for patient care we are not far away in fact we're already there for some cancers we're routinely sequencing a tumor genome and developing maps like this which reflect a high sort of a high altitude view of the derangements of a given tumor's genome will become part of a standard armamentarium for cancer diagnostics so while today you have pathologists toiling away looking under a microscope and saying ah this is a breast cancer this is a prostate cancer etc etc in the future and for some cancers that future is now yeah the pathologists will still look under a microscope but won't want to make all the predictions all the diagnostic details and making any sort of prognostic predictions without having knowledge of what the derangement map looks like of that individual tumor so again you absolutely can expect this and there's lots of research going on to make this a reality and meanwhile you hear about it I hear this on the news I'm not endorsing this particular group but in particular you're starting to hear how about genomic testing is the future cancer treatment and they're using the word genomics and in fact I've predicted that for the general public maybe the first time they understand the word genomics might be when they or a family member or a friend have a cancer diagnosis and the word genomics permeates into the discussion with the oncologist about how they're going to treat that patient again not hypothetical here and now second area not hypothetical here and now big word pharmacogenomics recognizing that everybody responds differently in this case it's to a roller coaster but everybody responds differently to medications and we know that CVS means well because every medication they sell you it works just doesn't work in everybody and what we've learned in the past 11 years in particular is that the reason we respond differently for the most part not exclusively for the most part for many medications is because of variants that we have in our genome that affect drug metabolism pathways and so forth and we're learning that indeed we can figure out in advance whether you're going to be a good responder or a bad responder or a medication and use that information to guide therapy it's such an important topic it's the reason why we're bringing Howard McLeod here a world expert to tell us about pharmacogenomics in the latest advances to get the right drug and the right dose of that drug to the right patient a third area of hot area represents the fact that these you know using genomics for medical care is not hypothetical it really can start to be done now but we need to get much better familiarity we need to take this new technologies new approaches out for a test ride and so at NHGRI we have a couple test drive programs I call them where we really are as part of a clinical research project sequencing patients genomes and figuring out how to do this reduce it to practice and you might hear more and more about our clinical sequence and exploratory research network which consists of now about nine different groups who are studying in different settings some in cancer some in neonates and so forth testing genome sequencing as part of the routine care of patients but meanwhile there's some success stories and those success stories need to be learned from and then those successes disseminated to other sites including diverse sites for clinical delivery of care and so we've created another network I'm called the ignite network whose goal it is is to take best practices and success stories in genomic medicine and see that they get propagated to other sites as well so again this is not hypothetical this really is happening in places and we're trying to learn from it and the FDA recognizes this is not hypothetical this really is here and now some of you may have seen last year both through a release at the FDA and Francis Collins commented on this as well and the leaders of both of these agencies published a paper in the New England Journal and the reason for all this flurry of activity was that the FDA has actually authorized as cleared in other words one of these fancy next generation genome sequencing platforms for clinical use so again FDA is in the middle of this because it's here and now it's not just hypothetical and indeed the FDA is getting quite involved recognizing that genomics and genome sequencing is going to become part of the mainstream of clinical medicine a fourth area which I also think is way cool and one part of it and then certainly very much on people's minds the other part relate to newborns either or prenatal circumstances let's deal with each of them separately and there's separate stuff going on in both in the case of prenatal genomic analyses very common place for pregnant women to get amniocentesis or getting chorionic villa sampling a standard battery of tests are typically run for in the prenatal setting to determine the health of that of that unborn child but it's pretty invasive to have amniocentesis done for chorionic villa sampling but guess what those new methods for sequencing DNA are so exquisitely sensitive that they can actually get all the same information by simply detecting the fetal DNA that is floating around at very low levels in the maternal bloodstream again this is not hypothetical this is here and now non-invasive prenatal testing is absolutely being published in the literature right and left it's winning awards right and left it is one of the rages if you will in terms of prenatal care because it is and there's companies forming up all around this because it is believed that amniocentesis and chorionic villa sampling will fall off precipitously because a simple blood draw of a pregnant woman will suffice using these new methods to learn the same information we were learning by getting DNA directly by these more invasive methods and in fact even the popular process talking about these major breakthroughs and prenatal screening as well at the other end of that pregnancy of course we have newborns and here in a country like the United States every newborn essentially is tested at a couple days with a little heel stick and a little bit of filter paper where that blood is collected called a Guthrie card often goes to a state lab where that newborn child is tested for anywhere between two and three dozen genetic diseases but we now know the genomic base is a 5100 rare diseases and maybe someday we may want to get more comprehensive understanding of the health and well-being of that newborn with respect to what diseases they might be susceptible to but wow that's going to be complicated and think of all the ethical issues and what to do with the information but it might be powerful but it also might have things we need to be cognizant of so we partnered with the Child Health Institute and created a new program called genome sequencing in newborns or in sight which is now studying that just getting out of the gates the past couple of months to now have a series of researchers study what the new world might look like with the ability to routinely sequence newborns what are the ethical issues what are the logistical constraints what can you learn is this going to be beneficial let's study this and so to pave the way towards the future again a program just out of the gates and the fifth area in terms of hot topics really relates again to a bottleneck I alluded to earlier just understanding at a clinical level what all this genomic information really means we don't want to end up in a circumstance where we can sequence a patient's genome routinely but then when we go to round on that patient in the morning we're just stuck wondering what the heck all this means even if we just have a list of three to five million variants we've got to empower these healthcare professionals with an easy immediate way to look up information about the variants in that patient sitting in that bed and know what to do with it and yet we are early days and possibly understanding exactly what to do so it is the reason why we are looking at the developing clinical genomics information systems ones that will integrate with electronic health records as they should but also ones that will immediately deliver the kind of information that a healthcare professional will access most likely through some sort of convenient device and just tell them whether it's a physician a nurse a pharmacist what they need to know about that genomic variant in that patient so we've just launched again just a couple months ago a new program called the clinical genome resource to start to develop the approaches for developing that knowledge base and figure out how to automate this because the amount of literature coming in is huge and the amount of knowledge we need to transmit is going to be huge we got to figure out ways to do this efficiently and lastly is the use of genome sequencing to deal with ultra rare genetic diseases something that I think is very familiar with many people right here in the NIH clinical center and in particular with the undiagnosed diseases program which has brought a lot of very goodwill to this clinical center because of the fact that it is now very straightforward and understandable that if you have a very rare of a patient with a very rare disease it makes a lot of sense just to sequence their genome because you've probably already done a huge work up on the person and you haven't figured out what's wrong with them and you can just see review article after review article after review articles describing how people around the world are doing this because just sequencing it as part of a diagnostic test as part of a very long diagnostic odyssey just makes a lot of sense you don't always figure out what's wrong with that patient but boy enough of the times do that it makes it worthwhile that is the reason why NIH is expanding the undiagnosed diseases program which started right here in the clinical center and is now developing a national network of these centers that's going to build upon the successful experience of the program here intramirally to improve the diagnosis of care of patients with undiagnosed diseases facilitate research into the etiology of undiagnosed diseases and create a highly collaborative research community to identify best practices the diagnosis and management of undiagnosed diseases and just later this year we'll be announcing a series of new sites that are going to be set up around the country that we'll interact with here at the NIH clinical center its undiagnosed disease program to now create a larger catchment system and to really scale if you will this ability to go from a very rare disease to try to help that patient that it will include in most cases a genome sequencing so that's most of what I wanted to tell you and I hope what I've done in many ways taking you over the last 25 years and describing this landscape is also convinced you that the relevance of genomics has changed a lot in that quarter century you know when I got involved on day one of the genome project or even year minus one I first got involved in genomics you know genomics was really about biomedical researchers in laboratories at computers toiling away but it was it was research but you know since the genome project ended increasingly we've engaged our healthcare professional partners and they've recognized that the future is going to be genomic medicine and they've gotten involved in this and they've gotten much more aware of genomics I see the big phase change now happening this decade and in particular next decade is that genomics is going to become relevant to patients and therefore friends and relatives of patients it's all of us because we're all going to know somebody who has cancer we're all going to know somebody who's going to have one of the disorders I've talked about and as a result though even the word genomics is going to become very relevant to us and so we think a lot about that it's part of the responsibility of the field of genomics is to recognize that this has gone from a scientific to a biomedical to now a societal discipline in many ways and we very much embrace the idea of thinking about the societal implications of genomics research some of you may have heard about our ethical legal social implications research program which has been ongoing since the beginning of the genome project and continues to thrive at our institute but even broader than that we think a lot about some of the public health considerations some of the educational issues a lot about genomic literacy and so forth when Colleen McBride comes here in April to give a lecture she'll be talking about public health implications of genomics which again is cast around some of these issues thinking more broadly about genomics and society last second to last thing I wanted to tell you though is we're very interested in literacy around genomics in particular public literacy thinking about the patient we'll have to encounter genomic information about themselves or their loved ones and we're doing various things but I just I have to put in a plug because it's way cool and I'm particularly proud of it and because you're here in DC area is that one of the things that we've done and this came out last year was to create to partner with the Smithsonian Institution to create a public exhibition about genomics and have it be at the the National Museum of Natural History right here in the DC Mall if you're not familiar with this or haven't heard about it it's this this exhibition called Genome Unlocking Life's Code a partnership between these two great institutions it opened in June and it's going to be here until September so I want to tell you about it Carl I'd convince you to please go see it before it leaves in the DC area in December if you need to find it it's really easy to find you just go to the Hope Diamond on the second floor of the museum and you take a left and we're in the hall immediately adjacent now that doesn't work just remember how many human chromosomes are there how many pairs of human chromosomes 23 pairs of human chromosomes and it's in hall 23 so that's the other easy way to remember it if you get there you and you get there tomorrow you'll well whenever you get there so far you will follow would have now been about 1.6 million people who have visited the exhibition since June but and if you get there before September you'll be whatever we expect to be over 3 million people who will visit it when it leaves here in September then tours around North America for about 4 to 5 years including Canada and so you can go visit another city as well but it's easy just jump on the metro go see it before it leaves and if you want to read more about the exhibition and a lot of affiliated programming information on education material this is its dedicated website unlockinglifescode.org and the last plug and then I really will be done last plug I would say is that I have come to realize that with the fast pace of genomics and lots of new stuff and in fact that whole last section I just went through with you is all stuff that some of some of those programs just started the past few months I am told that so much happens in genomics including coming out of our institute that people will find that they love to get updated and so I was convinced by especially advisors to try to develop a way to communicate what's going out of the institute if you're interested in getting updates about what's happening I now put out a monthly newsletter if you just go to that URL and follow the links you can subscribe to it and you don't have to and you don't but I'm just saying if you're interested in a little two pager about once a month it comes out at the beginning of every month please feel free to subscribe to this and every one of the new programs I was just talking about have been featured in the last few months in that monthly newsletter so I will stop there I think in the interest of time and I have used the time I will just end and thank you for your attention and rather than taking questions I'll just stick around the podium and anybody can come down and talk to me individually thank you very much