 Thank you, Greg and Bob, and it's a pleasure to be here. I've had the pleasure of a long collaboration with people at NHGRI. We recently actually have instituted a combined fellowship program in genetics with the intramural program at NHGRI. And, of course, now Suburban and Johns Hopkins have joined together, so we have lots to celebrate and to look forward to in terms of continued interactions. And so I'm pleased to be here to have a chance to talk to you about my favorite subject. I'm also pleased to be the leadoff speaker in this series, which Greg and Bob have put together, which looks quite good. And so I hope to sort of set the stage for many of the people that come later. Please interrupt me if I'm not making myself clear, or if you want clarification on some point, or if I'm talking about stuff that's old hat, tell me to move ahead and not spend time on it. So I'd love to make this as informal and as interactive as possible. So what I would like to talk about is the human genome and what I call individualized medicine. And so we could start off by saying what is individualized medicine? I'm going to put this out of the way. And so I turned to Francis for guidance here, Francis Collins. And Francis said back in 2005 that at its most basic, personalized medicine refers to using information about a person's genetic makeup to tailor strategies for the detection, treatment, or prevention of disease. And I think that sums it up quite well. The only place that I disagree with Francis is that I much prefer the term individualized as opposed to personalized. And I'm going to take the prerogative here since I have the podium to tell you why. So personal, if you look in the dictionary, has two meanings, really. It has relating to someone's private life, a sense of intimacy. And the other is relating to one person, a particular individual. Now, medicine has always been personal. As physicians, we are allowed to ask people questions that no one else asks them. We are allowed to examine them in ways that no one else examines them. So medicine has always been personal in my view, but it has not been individualized. And so that's what we're looking at, a way to consider each person as an individual and to adjust our thinking about the patient to account for their individual strengths and weaknesses. So for that reason and for others, which I won't go into, I urge you to consider thinking of this as individualized medicine. Now, why would we consider the topic of individualized medicine now? Why is it so much in the press and in the news and so forth? And that's particularly worth asking because modern medicine really has had enormous successes. So we've had a dramatic prolongation of the lifestyle, of the lifespan, and a dramatic improved quality of life. So medicine has been doing a good job. On the other hand, there are ongoing concerns. Many diseases have an increasing incidence. There's an unacceptable frequency of adverse events, adverse therapeutic events. We all hear daily about the cost of medicine continuing to go up. And if you talk to patients, and all of you do all the time, they usually say they want two things from their physician. They want a physician who's smart, has good knowledge, but they also want a physician who cares about them as an individual, as a person. So I think we have an opportunity to move medicine from a very successful level to a new plateau. And I think the way we will do that is by individualizing medical care. And so to put it in a different way, I like to think about a particular disease, and the one I would mention is type 2 diabetes. As you know, its incidence is increasing throughout the industrialized world, and it's intertwined with the increasing incidence of obesity, and it's a chronic illness with an array of complications, microvascular complications, and macrovascular complications. But suppose a member of your family or a friend of yours, a close friend of yours, has type 2 diabetes. Would you like to know the prognosis and response to existing treatments for the average patient with type 2 diabetes? Or would you like to know as precisely as possible the specific features, prognosis, and response to therapy for your loved one or your close acquaintance? So, or even better, could we imagine knowing ahead of time who is at high risk for type 2 diabetes and actually prevent the illness from ever occurring? So, that's the goal here, is to try to identify the individual strengths and weaknesses of our patients, and as much as possible, prevent them from ever getting sick. But if they do get sick, then to individualize our counseling and our treatment and to optimize it as quickly as possible. So, in that regard, I sort of think there are two characteristics of modern medicine, current medicine. So, the first is that in medical school and currently, I think we have been trained to perform what I call an average medicine. Now, by average, I don't mean pejorative, in the pejorative sense that it's mediocre. I mean that we think about, when we make a diagnosis, we think about what is appropriate for the average patient with that diagnosis. And part of that comes from the way we're trained, and I call that aspect of our training the classic case mentality. So, this is the little boy on the left is a patient that I saw and his sister, and they both have a genetic syndrome that's characterized by some abnormal physical findings. And in the days gone by, what would happen is the family would come to the clinic. We would take a family history and history of the present illness, a physical exam, some x-rays, some routine laboratory tests, or had seen the patient would get together and say, well, I think it's a case of this or I think it's a case of that. Usually what would happen at least at Hopkins would be the person with the most gray hair or the least hair would finally make a pronouncement that I think this is a case of whatever. And then our thinking would become constrained. We would start thinking of the patient as an example of a particular disease rather than thinking of the patient as an individual who happens to have this set of problems. And at the time, in terms of the tools that we had available, that was the way we had to practice medicine. Now, the other aspect of medical practice in the 20th century is what I call trial and error medicine. That is to say that as you all know, what we do is we see a patient, we make a diagnosis, we think about what kind of interventions we would like to make, we make some baseline measurements, we make the intervention, then we follow the patient and repeat the baseline measurements and ask, with our intervention, is the patient doing better? Is staying the same or is in fact worse? If they're staying the same or worse, then perhaps we'll change that intervention and alter it and do something else. So this is sort of a trial and error kind of medicine. So the goal in the future would be to be able to predict with a fair degree of accuracy what would be the best treatment for this patient without having to go through this trial and error sort of set of protocols. So thinking about the patient and as an individual, and so I think the more experienced physicians have, they come to learn, despite the fact that in medical school, classically you were sort of taught like this is what happens with a case of this, this is what happens with a case of that. As you get more and more experienced, you begin to realize that no two patients, even those patients with the same diagnoses, behave exactly alike in terms of their complications, their response to treatment and so forth. So that's a lesson that we tend to learn by experiencing the trenches. And this point was actually emphasized by Osway Tempkin, who is a professor of the history of medicine at Hopkins, now deceased, but he said, there is no science of the individual and medicine suffers from a fundamental contradiction. Its practice deals with the individual. In other words, the person that comes to see us is an individual. While it's theory, what we learned in medical school grasps universals only. So you're sort of left in days gone by to individualize your approaches and your thinking after you get out of medical school. This idea is not new actually. A colleague of mine pointed this out to me that back in 350 BC, none other than Aristotle said, the doctor does not treat man except accidentally. He treats Callius or Socrates or somebody else. So if someone knows the universal without knowing the individuals contained in it, he will often fail in his treatment for his individual who has to be treated. So it's not a new idea. So what I keep reminding our students, and what I think we have to think about is that when we see a large number of patients, I'm a pediatrician, so this is where I start seeing my patients, we just have to remind ourselves that each of these individuals has his or her own unique sampling of our species' genetic endowment. Each has a unique history of in utero development, and each is born into a family with a unique constellation of socio-economic variables. So all of those factors, the genetic makeup, the early development, and the social-cultural milieu for each patient individualizes them and has an influence on what diseases they are at risk for and how they will respond to our attempts to treat or prevent or treat those diseases. So this is all well and good, but you could ask what has changed, what makes it possible to contemplate moving medicine from its current successful level to an even more successful level as we go forward. So I would submit, and I'm a geneticist, so I would submit that the major driver for this has been the Human Genome Project and what we've learned about the genetic makeup of members of our species. So the Human Genome Project sequencing technology and the appreciation of sequence variation, what's come to be called whole genome sequence biology, an increasing prominence of evolutionary thinking in medicine, a progress in disease gene identification, and sort of what has been, in my view, a watershed, the ability to obtain individual genome sequences on individual patients. So I'm going to talk about each of these bullet points briefly. So first of all, let's turn to the Human Genome Project and sequencing technology and genetic variation. So you all know that the genome project really was contemplated in the mid-80s, and there was a good bit of argument initially about whether or not it was a good idea and a useful way to spend research dollars. But eventually the argument carried the day and the genome project got started officially on October 1st, 1990, under the direction of Jim Watson across the street. Francis took over in 1993, Francis Collins, and initially it focused on technology and model organisms, yeast and C. elegans and flies and so forth. In the mid-90s it turned its attention almost completely to the human genome, and in fact it was a competition between the so-called public group headed up by Francis and the private group headed up by Craig Venter. And miraculously, both groups finished on the same day, as shown here in this front page of the New York Times. This is Tuesday, June 27th, 2000, when both groups announced the fact that they had a draft sequence of the human genome. The public group went on to do more than a draft to do a very high quality complete sequence of the human genome, and that was finished in about 2003. So let me just make sure we're all on the same page with terms. I forgot to bring a pointer. Is there a pointer here? So I just want to, since it's been a while since some of you may have thought about this, what I've shown here is a diagram of a gene. So amazingly, the word gene was coined in about 1908 by a man by the name of Johansson. And if you look at how geneticists have defined what a gene is over the time since then, the definition keeps changing. And in fact, if you put 10 geneticists in a room right now and ask them for a definition of a gene, you might get 12 definitions. So let me tell you sort of, so we're more or less all on the same page, what most of us mean. So what I've shown on this diagram is a mammalian gene, a gene that encodes a protein. And it turns out the pieces of the gene that actually account for the coding sequence are called exons. They're the pieces that, thanks, Greg, that are retained. So here are the exons. This is a forexon gene. Those are the pieces of the gene transcribed into RNA, and then there's splicing that goes on. These four segments, the RNA corresponding to these four segments, ends up in the mature mRNA that goes out into the cytosol. And then there are pieces of DNA between the exons, and we call those introns, and when they're transcribed, the transcription of the gene to RNA would start right here. It would go like this, and then these intronic pieces would be spliced out, and the mature message would be made up of a sequence that corresponds to exon 1, 2, 3, and 4 all stitched together. So these are the exons. The purple line is the introns. And up in the front of the gene, the 5-prime end, there's some regulatory sequences we call that the promoter. There might be some distant regulatory sequences way away that we call enhancers. And the translation would start here once the RNA was made, and all of these would give information about making the protein that corresponds to the product of this gene, and then it would be the 3-prime UTR on translated region of the message. So this is what is a classic protein-coding gene. Now, we now know that there's other genes in the genome that encode RNA, but the RNA is never translated into protein. So there's a set of RNA genes as well as protein genes. For most geneticists, and for what I'm going to tell you today, when I say gene, I mean genes that encode protein, like this one here. So if we look at where we stood in 2003 in terms of understanding the human genome, there's some simple features that I just want to remind you of. First of all, if we counted up the number of genes in the genome, it turns out there are about 22,000 genes in the human genome. Now, this is a big surprise to everybody. We had a pool about how many genes it would take to make a human, and of course, because we're egocentric, most of us guessed way high. I guessed 100,000 genes. And it turns out that if you look at all organisms that have bilateral symmetry from flatworms to butterflies to fruit flies, it's about 20,000 genes. So for some reason, we don't know why, this is sort of the sweet spot for genes in the biological kingdom. We know right now of about the function of human genes of about 75% of these 22,000. So there's still a good number of genes in the genome that we don't even have a clue as to what their function is. Those exons, those pieces of genes that actually get spliced and retained in the messenger RNA and go out on the cytosol and are used to direct the synthesis of proteins, we can actually count them up now because we've got the sequence of the whole genome, and there are about 220,000 exons in the genome. And those exons are distributed over about 50 megabases. The entire genome is about 3,000 megabases or 3 gigabases. And over here is a comparison to the mouse and there's actually pretty good similarity between the mouse and the human genome. So 22,000 genes, about 220,000 exons. And the exome, which we've come to call that portion of the genome that comprises all of the exons, is about 50 megabases. That's only about 1.5% of the total genome. So there's a lot of the genome that doesn't seem to have much function. If you look, if you put an evolutionary test to it and ask how much of the genome is conserved over evolutionary time, it's about 5%. So there's an additional 3.5... The exons are very conserved, so there's an additional 3.5% that's conserved. That means it must have some function. We're not sure exactly what that function is. So there's still a lot to learn, but at least we have a list of the parts at this point. The reference sequence was done roughly 2003. We said, okay, we've got one human reference sequence, but if we look around the room, we can see no two people are alike. So what we really need to, if we want to move forward, what we needed to do at that point was to understand something about the extent of genetic variation in our species. And so the genome, the people involved in the genome project turned their attention to enumerating human genetic variation. We knew early on that one human to the next is pretty similar. The current number is around 99.5 or 99.6%, identical one person in the room to the next person in the room. And some people said, wow, that's an extreme degree of similarity. But if you think about it from an evolutionary point of view, Homo sapiens is a very young species, started from a very small number of founders. And so this is about the evolutionary spread, you would guess, over that period of time. And we're actually pretty close to our relatives. For example, in the coding sequence, we're between 70 and 90% identical with the mouse, and we're 98.5% identical with our closest living relative, the chimp. So on the other hand, 0.4% of 3 billion bases is actually a pretty big number, right? So there's a lot of chance for a difference one person for the next. So the genome project turned to enumerating that difference and the first project was called the HapMap, which studied three populations of humans from around the world, Northern Europeans, West Africans, and Asians, trying to find all the common variation. That was followed on by the current project, which is called the Thousand Genome Project. And actually the current goals of the Thousand Genome Project are to study about 2,500 individuals from about 50 populations around the world. And the idea is to catalog at least 90% of the variants that have a frequency somewhere in the world of at least 1% amongst human populations, and in the coding sequence, that exome part of the genome, to catalog all variants that have a frequency of at least 0.1%. So in other words, when the Thousand Genome Project gets done, we can look forward to having a pretty good handle on all variation that's common in the genome across our species in various places in the world. There's tons of rare variation that won't be detected by this strategy, so we'll continue to find the rare variation as we go forward. But at least we'll begin to have the common variation in our species. So what kind of variation is there? So there are several categories, and I'm just going to briefly mention them and focus on two. First of all, there are small insertions and deletions. This would be like a few bases are inserted in one place in the genome. And very often where they're inserted is some part of the genome that's non-functional, so it doesn't make any difference. Geneticists call these insertions or deletions indels, and that makes up about 10% of the variation in sequence. There's some length polymorphisms. These tend to be sequences that are also short, maybe two nucleotides or three nucleotides, repeated over and over again, typically in non-functional parts of the genome, but not always. That makes up about 5% of the variation. The variant that makes up a large chunk of the variant that I think you read about and have heard about are single nucleotide polymorphisms, and I'll talk a little bit more about those. They make up about 40-45% of the variation. And the other big kind of variation that we didn't really even anticipate in 2003, but we've learned about since then, and we know that it counts for a lot of variation, are so-called copy number variants, and I'll show you what those are, and they make up about another 40 or 45% of the variation. The variation is in these two categories, at least as far as we know, single nucleotide polymorphisms or SNPs and copy number variants or CNVs. Now, there's also variants where pieces of the genome, a chunk of the genome is broken at both ends and flipped around. That's called an inversion, and it can cause a problem if the breakpoints are in protein-coding genes. Those are hard to detect, and we don't really know the extent of inversions as a contribution of variation yet. We know certainly of some inversions that make a difference, but that's an area we need to learn a lot more about. Then, of course, at each generation, the chromosomes undergo recombination so that variants are reshuffled in terms of how they're distributed from one generation to the next. So there's a lot of genetic variation. Now, let me just emphasize, make sure we're all on the same page in terms of understanding single nucleotide polymorphisms and copy number variants. There's single nucleotide polymorphism. Here's part of the sequence. G-A-T-C-A. At this particular place, this T, there's a second form of the gene, a different allele, meaning a form of the gene. It's exactly the same here and here, but at this one position, it differs, and in this case, it's a T in the one form and a G in the other form. So it's a single nucleotide variant or polymorphism. So single nucleotide polymorphisms occur about one in every 1,000 base pairs in the genome. Some areas, they're a little bit more common and some areas are a little bit less common, but that's enough to give you about 3 million or so variants per haploid genome per individual. So that's a lot of variants to the extent that those variants occur when those variants occur in key functional regions of the genome. Moreover, these variants become very easy. The technology's been developed to easily and accurately measure at this position, let's say, whether the person has on one chromosome a T or a G and whether they have a T or a G on the other chromosome at that position. So that's called single nucleotide polymorphism or SNP genotyping and we have chips that do that and the standard platform right now measures about a million SNPs across the genome. We have a big center over at Hopkins called the Center for Inherited Disease Research and we do thousands of patients this sort of genotyping per day measuring these variants and so we use the SNPs as little tags to identify regions of the genome and how they've been transmitted down through the generations. So we'll come back to that in a minute. Now let me say a word about copy number variations. So here's two chromosome pairs and so think of this as perhaps the chromosome that you inherited from your mother and here's the corresponding chromosome that you inherited from your father and in this region there's a little deletion in this chromosome so that this piece of DNA that's meant to be here from your mother's chromosome is not there in the father's chromosome. Now it turned out that cytogenetics looking at chromosomes in the microscope has a resolution down to about 3 million base pairs, 3 megabases. That means a really good laboratory can see a change of a deletion or a duplication in a chromosome because at least 3 megabases were bigger. And standard molecular techniques of course were gauged to find changes of the sequence of a few base pairs, one or two or three base pairs. So if we had been smart enough a few years ago, someone would have said well wait a minute, you're looking at the genome with two technologies, one that has a resolution down to about 3 megabases and another that is the sweet spot of resolution is on the order of a single basis or a few bases so you're not looking at a change that's in the size interval between those two technologies and sure enough it turns out that these copy number variations here's the different kind of duplication in this region of the genome. So this chromosome is actually shorter by that amount that's duplicated this chromosome is longer, deleted and this chromosome is longer because that region is duplicated. So it turns out that there's a lot of copy number variation in our genome that means that in certain regions of DNA that's deleted off of this chromosome then this individual, instead of having two copies of that gene would just have one on this chromosome and would not have any copy of that gene on this chromosome. So that means that for regions of the genome that are affected by copy number variation we may have instead of two copies of a gene, one copy or if it's a duplication we may have three copies instead of two copies so that makes a lot of variation in the genome, it exposes genes that are sensitive to dosage. In other words, some genes it's important that you have two functioning copies other genes one is certainly adequate so it's relatively insensitive to dosage. We don't really know how many genes are dosage sensitive but we think maybe a few percent of genes are dosage sensitive. For deletions the other way that this can be important from a medical point of view is that if you have some normal variation in a gene on this chromosome or deletion over here, that normal variation may not be very important because you have two copies of the gene but if you have a CNV over here that deletes a copy of the gene then you have some variation on this chromosome that normally is not too significant it becomes more significant if that is the only copy of that gene that you have. So for deletions it exposes otherwise normal variation on the remaining allele and you can have fusion of genes where the junction occurs or the repair of the duplication occurs so there's lots of ways in which copy number variation can perturb genetic function and not surprisingly as we've appreciated this we found that this is a rich source for producing human disease. The bottom line of all this is there's a lot of variation in our genomes. In fact in 2007 Science Magazine said that the breakthrough of the year was human genetic variation and so we know that there are about 30 million single nucleotide polymorphisms in our species about 3 million differences between each individual as compared to the reference sequence and in terms of copy number variations there's 3 to 7 large copy number variations per individual about 5 to 10% of us have a copy number variation bigger than 100 kb the average gene is 30 kb and 1 to 2% of us in this room have a copy number variation bigger than a megabase could affect 10 or 20 genes so there's lots of variation both at the single nucleotide level and at the copy number variation level so in fact different members of our species are genetically different even though we only differ on the order of one base pair per 1000 bases now so there's a lot of genetic variation now the last thing I want to say in this category is the sort of advances in technology and I think many of you have heard about but I just used this single slide to emphasize the rapid change in DNA sequencing technology that's gone on since 2003 when we said we'd finish the genome project so down here are years and this is 2000 over here this is 2010 over here just pay attention to this red line which is the cost per million high quality base pairs of DNA sequence so up here at the start of in 2000 it was about $10,000 per million base pairs and you see the curve has come down so that in 2005 it was about a thousand in 2006 it was about 100 in 2008 it was 10 and in 2010 it was $1 so the cost of DNA sequencing is coming down down down down down very rapidly and not shown in this slide but perhaps you can get from the rate of accumulation of sequence here the ability to put is actually going up and up and up so the technology is advancing so that we can sequence DNA faster and faster and more and more accurately and cheaper and cheaper and cheaper so DNA sequencing is becoming a very practical tool to enumerate the genetic variation that we just talked about to begin to understand the genetic differences between people consequently we've begun to see in the literature and in other places the availability of sequencing the DNA of single individuals so these little figures here show that and by the end of 2010 we had about 25 or 30 individuals whose whole genome sequence was available and it's estimated that at the end of 2011 so we're one month away they'll probably be on the order of 30,000 whole genome sequences of particular individuals available in various databases so DNA sequencing is really making a huge impact in enumerating the genetic variation what we have to learn is how to interpret all that so that's all I'm going to say about the genome project genetic sequence variation and technology and let's turn to one make one point on what I've called whole genome sequence biology so it's interesting that remember I said at the start of the genome project there was an argument about whether or not it would be useful and would it stimulate research and would we get anything from it and now some 20 years or so later from when those arguments were going on any biologist who's studying any species wants to have the whole genome sequence of their favorite organism so it's a complete flip in the mindset and it's hard to keep track of I sort of use this tree of life to keep track of it we have whole genome sequences from eukaryotes, animals like ourselves from bacteria, prokaryotes and from members of the archaea which is the third kingdom of life which we only recently found out about and it's really pretty hard to be sure but I think that we have certainly more than 2,500 organisms whose whole genome sequence has been obtained and deposited in various databases so we've gone from arguing is it useful to now everybody's got a habit and use it for their favorite biology and it's turned out to be a very potent stimulus for understanding biology and the pace continues the other thing that's important to realize is that the sequence that's used, the protein coding language really holds true across all biology so once you have a sequence of a eukaryote you can use that sequence information to go look for the corresponding genes and organisms that are evolutionarily as far removed as bacteria so the DNA sequence provides a language of biology that allows us to look at what particular genes do across all biology and so we gain a huge amount of information by having that language, that universal language across all biology ok now that's all I'm going to say about whole genome biology let's talk just for a minute about evolutionary thinking in medicine now when I went to medical school evolution was not mentioned I think the whole four years I was in medical school I doubt that the word evolution was ever uttered and if you asked me and I was very interested in biology about evolution I would immediately start thinking about dinosaurs and fossils and things that were pretty far removed from medicine but as Dobjanski said nothing in biology makes sense except in the light of evolution and I think now nothing in medicine will make sense except in the light of evolution we are part of the biological kingdom we result the end products of evolutionary biology so what do I mean by this well if you start looking at how evolution works and then think about what it means for medicine for evolution a central theme is variation and I've just showed you that we're now focusing on human genetic variation so centrality of variation in terms of how things change over time the continuity and consequences of natural selection that is to say natural selection is going on all the time we all partake of a little natural selection when we went outside and we had that very great breakfast that was served up to us in terms of the caloric increase the kinds of nutrients that we exposed ourselves to and so forth so natural selections going on all the time biological systems have developed mechanisms by which they respond to the environmental changes to evolve and that's turned our focus to systems biology putting organisms back together instead of using reduction actually using an integrative approach to understand biology as shown here and an emphasis on individuality because if you look at how selection works in whatever species you're thinking about the selection actually occurs on individual members of a species so that is what goes on in our species as well and that selection which in other species we applaud because it serves to make biological characteristics in different species and our species the people who come up on the short end of the stick for natural selection are the patients that come to see you with medical problems so it's natural selection that's going on in our species we care about those individuals in other species we don't worry about that if you're interested in that there's a review of this in PNAS in 2010 about evolutionary biology and medicine so from a point of view of what we've learned from the genome sequence in evolutionary biology it's really been very interesting because we can look and see how we compare with our closest living relatives at chimps remember I said we were 98.5 identical so it's interesting to know how we differ what makes us different from the chimps we can even now sequence our closest relative ever which was Neanderthal so now the genome of Neanderthal is here by Svantipabo and his colleagues and you can ask okay what are the differences, the major differences between us and Neanderthal and if you enumerate them and lump them together it turns out there are a bunch of genes that show sequence variation between us and Neanderthal that are involved in energy metabolism there's another bunch of genes that are expressed in the nervous system and are thought to be important in cognition there's another bunch of genes one that I'm particularly interested in are involved in neurodevelopment and then the last category that's particularly variable are in microRNAs these new RNA molecules that are important in regulation machine expression so you can begin to see get the idea of what is it that has changed over evolutionary time to allow homo sapiens different properties and different characteristics as compared to Neanderthal our last relative so that's evolutionary thinking in medicine let me just say a word now about disease gene identification so disease gene identification if you look at when disease genes were first identified roughly 1900 time between 1900 and 1910 there was some knowledge of color blindness before then but we began to think about genes causing specific human phenotypes in that first decade of the 20th century but progress was very slow and I plotted it here this is a modification of a plot that originally was published by Joe Nadeau just look at this pink curve on this scale over here and what I've plotted is the identification of genes that are responsible for rare Mendelian conditions so these are things like PKU, cystic fibrosis Marfan syndrome LDL receptor defects all of those strong phenotypes that are inherited as Mendelian traits just as exactly the way that Mendel showed in pea plants and so you can see the number has gone up pretty dramatically and currently there are about 2600 genes in the human genome that have been shown to have variations that account for particular human diseases we'll come back to the common complex traits later but that's on a different scale you see here so we're way behind on common complex traits so this is focused for a minute on the Mendelian disorders in this category and you can actually look at the progress and there's an online resource called Online Mendelian Inheritance in Man this was started by Victor McCusick at Hopkins and currently it's maintained by Dr. Adahamish and her team at Hopkins and it currently lists here I say 2500 but the today's count is around 2650 and there's another online resource called GeneTest that measures the number of these genes that can actually be measured to make diagnosis or sequenced to be make diagnosis and it's about 2000 now this plot which came from an article by Art Podette can't read it I guess here but this looks at the number of genetic tests going from 1990 up to the year 2000 and you can see this tremendous increase so that here we had less than 50 genetic tests now we have about 2000 genetic tests this is causing a radical change particularly at least at Hopkins for the way pathology deals with this so a patient is seen and the doctor wants to send off a test for some very rare disorder and the pathology department has to find a laboratory that does the test and make sure that they're certified and so forth so one of the things we're wrestling with is how to modernize the way we handle requests for genetic tests and how we interpret those results molecular cytogenetics the copy number of variations is moving along so we're making a lot of progress in this whole effort now if you want to find out the one thing I want to point out though that although this number is big it's only about 15% of the total number of genes so we have a lot of work left to do there's no reason to think that the other 85% of genes won't also have Mendelian phenotypes associated with them if you're interested in keeping track of this I urge you to go to this catalog online Mendelian inheritance of man I already mentioned it's very user friendly you go to www.omim.org and it has a search box here on the first page and you can punch in that search box either a disease name or a clinical symptom so here I've written in Marfan syndrome and I get a list of entries and those entries include fibrillin that's the gene that's responsible for Marfan syndrome here's the clinical phenotype Marfan syndrome so if we're in the clinic we see a patient we think might have Marfan syndrome we want to look up the clinical features we put in Marfan's we click on this when we get it and I'll show you what you go to or if we want to learn something about the molecular biology we put in Marfan syndrome comes up with a gene click on this and we go to the gene and so these are other symptoms other syndromes that are similar have similar overlapping features that's if you put in Marfan's you could also put in not the name of the syndrome but just some clinical features so here I put in a tall stature and dislocated ocular lenses and you'll see I get pretty much the same thing the first thing that turns up is fibrillin that's the gene responsible for Marfan syndrome the third entry is Marfan syndrome the second thing that comes up on the list is homocystinuria which is a phenotype that is very similar to Marfan syndrome there's a diagnosis as a tool for trying to figure out what your patient has based on the clinical findings that you observe if you actually go to the entry so here I've gone to the entry from Marfan syndrome there's a long, long entry I've just shown you the top of it but it describes the history the clinical features and so forth you have this table of contents over here so if you're just interested in how to make the diagnosis you pull that down click on that and it will tell you how to make the molecular diagnosis where to send the test and so forth or if you want to know are there animal models and what have we learned from that you just click on that so it's a very useful tool it's free and very easy to use just go to omem.org and try it out now what about identifying the genes not for these rare Mendelian disorders but identifying the genetic variance that contribute risk for common complex traits like diabetes that we already mentioned or coronary artery disease or neuropsychiatric disease so here notice that the scale is different and the progress has been very slow although recently it has spiked up tremendously so we now have variants that we think are responsible or contribute risk for at least 200 of these phenotypes by and large these variants are not causative they're actually susceptibility variants that either increase or decrease one's risk for a particular phenotype and the method that's used for this is I'm sure you've heard about is called genome-wide association studies or GWAS studies and these studies are agnostic approaches that identify SNP markers enriched as cases as compared to controls so typically if you want to do this you have a large group of cases and a large group of controls and you do that SNP genotyping that I mentioned earlier across the whole genome and you look for particular SNP genotypes that are enriched in your cases compared to your controls and it's agnostic in the sense that it makes no assumptions about what the genes are what the variants are that are responsible the only assumption it makes is that somewhere in the genome there's a variant that contributes risk for it so it finds stuff that's not looking under the light post but looking across the whole genome without any sort of preconceived notions it's a very powerful aspect of this that's been very informative to us and so you find markers SNP markers and then you look around those markers and you try to find the causative variants that are actually responsible for the change in susceptibility and once you find those causative variants that gives you a particular gene and tells you something about the biological perturbation of that gene function that increases the risk or decreases the risk for a particular trait and it also gives you to the extent that you know what the biological system that gene product works in it identifies a biological system that is perturbed and it gives you changes your risk for a particular disease of interest so this agnostic approach has proved very powerful in terms of illuminating biological systems that are responsible for certain phenotypes that we had no prior knowledge that they played a role in that in the case of type 2 diabetes years ago we all thought that that was insulin resistance that was a peripheral problem that the peripheral tissues were resistant to insulin there is some degree of that but it turns out that the variants that contribute risk for type 2 diabetes are an insulin production not an insulin resistance so and then of course understanding the pathophysiology gives us a better way of treating of dealing with the patients this is that sort of a diagram of how this might happen this is from a paper by Terry Minolio at the genome institute and it is in this series and I'll come to this at the end in the New England Journal that Greg is one of the editors for and it really is a wonderful collection of papers of sort of state of the art genomic technologies this diagram I don't know how well you can read it but it shows a region of the genome and maybe the distance between these two single nucleotide polymorphisms is 1 kb and it shows it in three individuals so you get the genotype of this SNP and the genotype of this SNP and these three individuals plus a whole bunch of individuals and you look at the frequency of those genotypes in your cases and compare them to controls and let's say in the cases the particular variant is more common and so you see more heterozygotes and more homozygotes for that variant you might want to do a replication study a different population to make sure it's not something due to population stratification in the end result then you plot out all of those variants and you ask are there any variants that statistically are associated with a statistically at a statistically significant level with a particular disease phenotype this has come to be called a Manhattan plot because it looks like the skyline of New York, all the variants in each chromosome are color coded and you see they're all more or less clumped around the bottom here except in one region on chromosome nine there's a bunch of variants whose p value is exceedingly low that is here p less than 10 to the minus 8 and so they're statistically significant even with all of the tests that one has done so that says in this region of the genome defined by these two markers there are some variant or some set of variants that contributes risk for this particular disease so now we go look at that region very carefully identify the cause of the variants and move forward in our understanding of the biology of the disease if you're interested in this NHGR maintains a great website and here's the whole genome and all of the variants that have achieved statistical significance for all these phenotypes this was up to date as of March 2011 I think there's a newer version online now the interesting thing is that many of these variants as I've already indicated tag genes or biological systems that we did not previously know were important for particular disease phenotypes the other thing that we've noticed is that most of these variants for the common complex traits are not in protein coding space not in those exons that we talked about which are usually hit for the Mendelian disorders but are in fact in regions of the genome that seem to regulate gene expression so you remember the little diagram of a gene I showed you there were upstream sequences in the promoter or more remotely related to the gene called enhancers and we think that most of the variants that are involved in this actually perturb gene regulation so they're in the non-coding regulatory regions of the genome that's important because we don't know we're more the state of the knowledge is weaker in terms of understanding regulatory variants as opposed to protein coding variants so it's an important area of research going on right now and in aggregate if you look at a lot of disease phenotypes we haven't found all of the variation yet so much of the heritability that is the genetic variation that contributes to a phenotype remains to be explained that has come to be called the dark matter variants so far identified for particular complex traits may vary from as high as 60% that's probably where we are for age-related macular degeneration to as less than 5% that's probably where we are for type 2 diabetes so for some disorders we've only explained a small fraction of the genetic variation for other disorders the methods so far have allowed us to explain quite a substantial fraction of the genetic variation so this comes to then if we've found variants but they only explain a small fraction of the risk people have said well what have you learned well one thing you've learned is you've identified biological systems and those systems become important to study to understand the mechanisms of the disease in a more complete way the other thing is that the risks that we calculate by present methods I think are underestimates for a variety of reasons so the common sort of critique of this is that the risk allele at this single nucleotide polymorphism confers a risk to individuals that is only 1.2 times greater let's say for example so it's hard to change medical management it's hard to get a person to change their lifestyle based on a tiny change in risk so we have to do better than that now remember that these risks are calculated in populations they're not calculated in individuals so for a given individual a particular variant may be much riskier or in fact it may not be risky at all we're just talking about the risk across populations so we have to learn how to individualize these risks and we have to recognize that the way these variants work are in complex biological systems so this shows a complex biological system each dot representing a protein product and the interactions between these protein products represented by the lines connecting the dots so the systems are complicated and involve many components so what we really need is not to look at a particular variant but we need to learn how to look at sets of variants characteristic of a particular individual and also integrate that with the environmental exposures of that individual to really calculate an individualized risk and we just are not able to do that yet although I must say that people are making considerable progress in developing new analytical methods a more biologically based I would say set of analytical methods to really calculate accurate individualized risks and in fact one strategy that's very recently been applied and it's turning out to be much more identifying much greater risks actually is looking not at the clinical phenotype but looking at biochemical markers so this has come to be called the metabolomics and this is a study that was just published looked at about 3,000 individuals used a whole genome snip genotyping that we've talked about measured about 250 metabolites very precisely in these individuals and found 25 loci with effect sizes anywhere from 10 to 60% of explaining the biological variation for those small molecules so it suggests that if we sharpen up the phenotype that we're looking at in this case measuring a biological marker the risk will be much more predictive much more significant so this is just the top of that list of all the variants and you can see p-values here on the order of 10 to the minus 250 so these are highly statistically significant variants that influence the level of these metabolites the metabolites in turn are involved in a variety of complex traits so we're making a good bit of progress in this area I'm not going to say anything more about identifying variants for complex traits and I just want to say a word about individual genome sequencing and then how and what this means currently for the practice of medicine so individual genome sequences we've already mentioned that the first one that was published was Craig Venters and I think that's because part of his genome sequence was what his company sequenced in the race to get a whole genome so we had the reference sequence but of course that was an anonymous person so what you really want to know is what is the sequence of my patients so that's why I think individualizing obtaining the sequence of individuals really is a really a change in the way we look at patients so for example in Venters he had 4.1 million variants as compared to the reference sequence that included 3.2 million SNPs and about 300,000 copy number variants there were 90 inversions and the total number of space covered by the variants was about 123 megabases that's a huge chunk about 12.3 megabases a huge chunk of the genome so put it in personal terms you could look at this individual and you could look at his lactase genotype and ask you know is he someone who can tolerate ice cream or not you could look at his dopa dr4 receptor that's associated with risk taking behavior and you probably could have made a guess about Craig Venters risk taking behavior genotype before doing it but you can actually make that measurement or you can look at his apoE genotype and understand his risk for whether or not he has an increased risk for Alzheimer's disease so it's a completely different level of information about individual patients so I think this will have all of these things all of these changes and advances will have profound effects for medicine this is a picture of a doctor Serani who was rounding in Kansas in the middle of the 20th century picture was taken by Eugene Smith this is sort of the idea that I had been doing when I decided to go into medicine of course it's far from what we do so we sort of have summed it up in terms of developing what might be called the science of the individual how we're going to use this information to understand our individual patients so what have we learned about the science of the individual currently so first of all it exposes the pitfalls of typological thinking in other words remember those kids I showed you where you say this is an example of a certain disease rather we think this is a patient who has features of a particular disease and we understand that no two patients have exactly the same manifestations of that disease and no two patients will have the same responses to our attempts to treat them so it confirms what is in the past called the physiologic view of disease each individual has their own disease it emphasizes the importance of asking why does this particular patient have this particular problem at this particular time so it turns the focus more on trying to understand why people get sick and what can we learn about from that exercise in terms of managing the patients as we go forward in terms of the best treatment for this particular patient however moving into medicine and making it practical and bringing it to the clinic and to your offices is a challenge and you all understand that it's interesting to look at a paper that came out recently that attempted to do this this is a scientist called Steve Quake he had a relative who dropped dead of sudden cardiac death in his 20s here's a quake over here so he went to his cardiologist out of Stanford and he said look I have this relative who just dropped dead in their early in their 20s and I want to know if I'm at risk for that so that's a reasonable question to ask so they got a big pedigree and they went ahead and sequenced his genome and then they tried to use that information to give him a more informed understanding of his risks not only for sudden cardiac death but for other common medical problems and it turned out that was a really daunting exercise it took all of these people here's Quake, he got to be an author on his own sequence here's the person who led the study there's a cardiologist there is one medical geneticist and one genetic counselor it took the genetic counselor five and a half or six hours to sit down with Stephen Quake who is a very accomplished molecular biologist and explained to him all the variation that was found in his genome so you can imagine doing that exercise to less sophisticated individuals and in the end at the current state most of the information we could give him changing his risk for certain things in modest ways so it did not really overnight change how Quake would be managed and certainly did not change much beyond what we would have done from having his pedigree on the other hand we're learning stuff about how to use this information as we go forward virtually every day so I think going forward we will increasingly learn how to use this information in a much more effective way and I would support that argument with these examples first of all to do this it will require rigorous research of the kind the genome institute and Hopkins is doing both at the basic level at the translational level and at the clinical level new technology continues to accelerate the pace and it's not going to happen overnight it happens gradually and let me give you these three examples first of all acute lymphoblastic leukemia when I was a house officer in the late 60's and early 70's acute lymphoblastic leukemia was the most common form of childhood leukemia it had a 95% mortality rate 95% mortality nowadays acute lymphoblastic leukemia remains the most common childhood leukemia it has a 95% survival rate 95% survival so it went from 95% mortality to 95% survival so what accounts for that change so actually if you look at it the medicines that are currently being used are very similar if not identical to the medicines that we used all those years ago so it's not the kinds of medicines that are being used what it is I would argue is that oncologists have learned that this diagnosis acute lymphoblastic leukemia is actually a heterogeneous group of disorders and they've learned how to use gene expression profiling age and onset DNA sequence variation and other tools to subdivide the patients in other words move from one collective diagnosis to subcategories of diagnosis moving towards individualizing the diagnosis to individual patients and then manipulating their treatment according to which subdivision the patient falls in and that approach a more informed approach in terms of differences between individual patients with the same diagnosis has had a dramatic effect on the consequences of having LL the same is true but to a less effect for sickle cell disease you all know that there are patients with sickle cell disease who are very sick from infancy forward and there are other patients that just have an occasional crisis maybe once a year or once every few years so there's tremendous variation among individuals with sickle cell disease and recall that they all have exactly the same genetic defect at the disease gene locus they all have exactly the same mutation in beta-globin what makes the difference between one patient with sickle cell disease and the next so increasingly we're finding modifying genes that modify the phenotype of sickle cell disease and we can define a subgroup of sicklers that are much common much more likely to develop let's say certain very disastrous side of complications of sickle cell disease such as stroke and so forth and we can manage that subset of patients with sickle cell disease more aggressively when they're at risk for developing a stroke let's say so we're individualizing therapy for sickle cell disease and that's having better outcomes recently the genome the genome scientists are sequencing tumors so there's a lot going on now about sequencing individual cancers and the people who have the cancers and one of the interesting things that's come out first identified by Bert Vogelstein at Hopkins and he's called for me the most serious brain cancer and it turns out that a small fraction of glioblastoma multiformes had a mutation in isocitrate dehydrogenase that's a gene encodes an enzyme in the citric acid cycle but it turned out that you could stratify the patients about in terms of whether or not their tumor had an IDH1 mutation and if you did that it turns out that the patients with the mutations in their tumors behave differently than the patients that don't have those mutations so we're moving again towards stratifying a diagnosis moving to individualize the diagnosis and adjusting our treatment and our thinking about the patients accordingly so this is going on over and over again and it'll go on rapidly in some areas and more slowly in other areas and eventually we will lead to a sort of a very individualized approach that I find sort of clever this came from the decode genetics in Iceland and they said if you look at baseline PSA levels there's actually evidence that the genetic makeup plays a big effect on your PSA level so currently as you know we use this standard cut point for PSA of 4 but if you look at normal individuals 4 is actually their PSA is actually a good bit below 4 and then other normal individuals have a PSA above 4 so this 4 is sort of an average cut point so they argued that let's say you measure genetic variation at 6 loci they recommended and then you adjusted the cut point for the individual based on their genetic makeup so that 4 would actually be too high for some individuals and for other individuals that's acceptable so you individualize the risk that you determine with PSA level and that gives you a more informed way to deal with the patients now time is short I'm not going to say anything about pharmacogenetics except that it is a classic gene by environment interaction the environmental variable in this case though is very well defined you know the drug you know the dose you know when the patient started at it and not surprisingly there's a lot of genetic variation that influences response to drugs so that's an area that's going to go forward very quickly and it already has numerous positive effects time short and I won't talk about it but variants that influence your response to statins or your response to treatment for hepatitis C and so forth and these variants tend to be variants of quite large effect so that's an area where the variation really has turned out to be very important for the phenotype the end result of all of this I would argue will get us to this point so this is a picture painting by Sir Luke of the doctor looking at his patient and this is what we would like to do we would like to understand our patient we'd like to look at that patient and not only use our history and our physical exam but knowledge of the genetic make up and the patients environmental histories to really understand the patient in a level that is far better than what we currently can understand the patient so I think over the next few years you'll see tremendous progress in this approach so that we can think of our patients not as representatives of particular disease but as individuals who have a particular set of problems so with that I'll close thanks for your attention let me give a plug to this set of articles which you can find in the New England Journal Genomic Medicine and Updated Primer and Greg is one of the editors the one that came out this week is called Genomics and Cardiovascular Disease Quite Good and I should also acknowledge and a heavy dose of Barton Childs shown here now deceased who spent his whole life really thinking about how we could incorporate genetic knowledge into making management of our patients more effective and more individualized thank you so realize that you can probably have to get off and implant your veins probably a number of questions do you have a sequence of embryonic stem cells that are completely represent fully developed or is that sequence really enough to be modified at that early stage so the question is have people sequenced embryonic stem cells and what's different about that sequence as compared to can you manipulate it so that touches on a whole area which I did not say a word about which is epigenomics so if you look at the sequence let's say from a particular individual and you could develop that cell line and then follow the individual over their lifetime the sequence would remain the same we were born with a sequence that was put together at the time of the sperm and the egg that made us form a fertilized egg but what is different if you look at an embryonic stem cell versus cells in the adult the epigenomic imprint so this is the patterns of regulation of genes so the way I think of it is if you look at let's say the liver in an adult when you have a liver cell and that liver cell divides you get two liver cells if you look at let's say a muscle cell and that muscle cell divides you get two muscle cells and yet the genetic material in those two cells is the same so what's different about those cells and the reason one cell is a liver cell and one cell is a muscle cell is that there are these programs of regulation of gene expression that are sort of turned on and turned off and so in the liver you turn on a program that's necessary for making liver cells you turn off everything else in the muscle you turn on a program that's necessary for muscle cells and turn off everything else that programs of regulation of gene expression are called epigenetics and so what we would see in an embryonic stem cell is a much more non-committed epigenetic set of regulations and as the cell was differentiated into different cell types the epigenetic patterning of the regulation of gene expression would become established to make the daughter cells that derive from that embryonic stem cell develop move them down the developmental pathway to the various pluripotent outcomes that we would expect when you go to the next generation all of that has to be erased because you start not with a collection of liver cells muscle cells and brain cells you start with a single cell that then has to be pluripotent to become all other cells in type 2 diabetes you mentioned that the peripheral of the stem cell there is no problem there is no difference between type 1 and type 2 I didn't say there was no problem that you make a good point what I meant to say there certainly is an element of insulin resistance but it turns out that equally important if not more important type 2 diabetes are various aspects of insulin production type 2 diabetes is different from type 1 diabetes which is a more pure drop out of the beta cell basically correct so the question is where is the limit of this curve that has to do with the cost and throughput of DNA sequencing and I don't know we haven't reached we're not even close to the limit so you know that some years ago Francis set the audacious goal of a thousand dollar genome and certainly we can do a whole genome you can order a whole genome on a patient let's say at Hopkins for about four thousand dollars right now so that's pretty darn close to the thousand dollar genome you can do a whole exome you can look at the exons for about a thousand dollars now however that gives you sort of a preliminary set of analysis of that sequence it does not give you a sophisticated analysis of that sequence and currently in fact there was an article in New York Times yesterday pointing out that the really expensive part of genome sequencing particularly as what we're interested in is in the analysis and that is coming along at a slower pace if you have to factor in how much does it cost to pay the people to do the analysis and so forth it's more expensive but there are new technologies available compared to the way we the current next generation there's already a next next generation that's clearly coming down the pike and that will clearly lower the cost and increase the throughput so I think the thousand genome will easily be surpassed in the near future and what I tell patients and medical students is of course if you come to Johns Hopkins I don't know how it is here at suburban but if you have some complicated problem you come to Johns Hopkins at nine o'clock in the morning you go home in the afternoon you're going to blow a thousand dollars very fast it's in a range of everything I mean maybe I didn't be able to get out of the parking lot for that yes the question is so the question is how close was the Neanderthal genome to Homo sapiens and would they be interfertal so it's about 99 first of all the sequence quality of the Neanderthal is nowhere near the sequence quality we have for Homo sapiens but the best guess I think is it's about 99 99 percent a little bit better than 99 percent 99 percent identical and people were very interested to know if Homo sapiens for some reason actually I don't know why we're so interested to know but people are interested to know whether there was any interbreeding between Homo sapiens and Neanderthal and the genetic evidence that we have right now suggests yes there was interbreeding between Neanderthal and Homo sapiens and we cohabitated and it seems to me that pretty likely that's where I would have bet before we had the genetic evidence human nature being what it is