 I changed the title to engineering healthcare systems because I'll tell you a story about the way we're doing it, but the way we're doing it is influencing the way other people are doing it, in part because of NHGRI support. I'll have to start by thanking the organizers for inviting me to be the mousepiece for this electronic medical records kind of effort. So this is the map that you've seen before, and if you haven't, I don't know what rock you've been hiding under. We are... I'm going to talk about this space right here, genomic predictors of disease susceptibility and drug response, and then I'm going to talk a little bit about engaging the electronic medical record for discovery purposes, not for patient care purposes, and then I'll talk a little bit about engaging the electronic record for delivering genomic medicine. So there are many definitions of genomic medicine. I'm part of a working group at NHGRI that has spent an inordinate amount of time working on these words. And these are the words for better or worse, and when I argue with Eric about changing those words, I get told that the words are done, let's work on something else. So... And I think that this is a very reasonable definition. I don't want to say that it's not. I will say that genomic medicine is part of a greater vision of what I think... People are calling precision medicine or personalized medicine. I like personalized medicine better than precision, but that's a debate we can have later. I think... Because precision may overpromise, and personalized means you're taking care of a single patient. I'll come back to that theme later. Any clinician who wants to give a talk has to quote or at least acknowledge William Osler. I'm a Canadian, and William Osler was a Canadian, so I have to acknowledge him twice, and so he'll say that the good physician treats the disease, the great physician treats the patient who has the disease. So it is an important part of what we do, and it has been an important part of medicine all the way along. The addition of genomic information just makes it that much more complicated, but that much more personalized. So here's one view of the vision, published in the New Yorker in 2000 when that famous press conference happened, and this woman is handing her sequence to a pharmacist, not to a physician, but to a pharmacist. So it emphasizes sort of that this is going to be team sport. She also has it written down on a piece of paper, so almost certainly that's not the way it's going to work. He's pretty confused, and that is certainly the way it's going to work. I was of two minds whether to show this other part when Francis Collins was appointed director of NIH. He was asked about a lot of things, but he was asked about pharmacogenomics, and this is what he had to say in one paragraph, because pharmacogenomics is the easy stuff. You can read it. I'll just say that there must be a pointer here somewhere. I'll say that if everyone's DNA sequence is already in their medical record, it's simply a click of the mouse to find out all the information you need. There's going to be a lower barrier, and then wonderful things will happen and improve outcomes and reduce adverse events. And obviously, those of us who play in this space really buy into that idea, but I will say this, and I've said it before. I'll say it again, that I disagree with that particular word, because it's anything but simple as those of us who are trying to do it have discovered. So I think the way that this is going to happen is that some institutions and then more and more institutions will buy into this idea, and how are they going to implement? I think you have to implement by having excellence in basic science. I'm going to come back to that over and over again. This is not unidirectional. The translation and the implementation feed discovery science. There's a commitment to information technology, which cannot be overemphasized. We're very fortunate at my place to have a Department of Biomedical Informatics that has 75 faculty members, which is pretty big. And then you put your health care system to work for discovery. And I'll talk a little bit about that. And then once you've discovered something that you think is actionable, then you can start to put it to work for patient care. And the point is that this is an iterative process, and so it goes back and forth. So what we've done for discovery in a nutshell is we've created a biobank. We can talk about the biobank forever, but as of yesterday, there were samples for 163,941 patients in the biobank. Those are DNA samples, and it's a pretty large number. And they're coupled to electronic medical records. Why do you need it so big? And what can you do with it so big? I probably don't have to emphasize that to this audience, but I thought I would just walk you through a question that I was asked by one of our new faculty members. He said, you have any patients who have vitamin D levels? You have any patients who have vitamin D levels and GWAS data as well? So one of the rules of the game is once you do a GWAS on any one of these samples, it comes back to the resource. So I went to a new resource that I'm proud to show off, and I would love to do it in real time, where we just go to a website, and this is what the web interface looks like. I type in vitamin D, and I drag and drop, and it asks me what kind of vitamin D level you want, and I say any lab value. And it turns out that in our electronic medical record, there are 13,847 people who have vitamin D levels. And there is a reason that this number and this number don't agree with each other. It's not just that we add it up differently that there's a great reason somebody can ask me that afterwards. Those of them in the Biobank, that's a subset of the entire electronic record, is 5,497. Then I can ask how many people have had GWAS genotyping right now? It's more like 20,000, but there are only 10,000 in this particular interface right now, and the intersection is 1,000 people. So that's a big set. And you can get the GWAS data on these people with vitamin D levels for free, essentially. So it's an enabling resource for discovery, but the point is you start with 163,000 to get to this set of 1,000. So if you start with 10,000, you're gonna get down to a set of 20 or something like that, some relatively useless numbers. So I think you have to have big numbers. And what we're doing in BioView is we're looking for genomic variants that are associated with all the things that you might think of, and then we're doing an inverse experiment called FIWAS, which I'll tell you more about in a second. So rather than dwelling on triumphs of BioView itself, I'll just say that we're part of the electronic medical records and genomics network that NHGRI funds. These are the nodes in the current iteration of the network. There are nine nodes, 10 centers, and there's a reason, again, that that math doesn't add up perfectly. And what we do is we define phenotypes within the electronic medical record and then identify cases and controls for to identify genomic variants that drive those phenotypes. One of the things that we have learned over the last five or six years of doing eMERGE 1 and eMERGE 2 is that writing the phenotypes and validating the phenotypes to find cases and controls is pretty challenging. We've gotten pretty good at it, I think. We're certainly very good at finding diseases. We're not so good when I say we, I don't mean me. I mean the informatics guys who work on this. I'm just a mouthpiece. But we've gotten pretty good at that. The next challenge is finding people who have a disease followed by a drug exposure, followed by a drug response phenotype, and that's a little more challenging and we're working on that part as well. All the phenotypes are publicly posted into something called FIKB or Pharmacophenome Knowledge Base and this is what a web shot looks like. Basically all the phenotypes are listed there, who's done them, how validated they are and what's interesting is what kind of elements go into them, what kind of codes, what kind of natural language processing medications. There are all kinds of different ways that you could have of identifying and validating a phenotype and we go through a lot of hand curation to make sure the phenotypes work right. So those are there for anybody who wants to play in the electronic medical record space. I am an electrophysiologist, cardiac electrophysiologist when I'm not doing this for a living. So I thought I would show you one e-merge project that came out of an electrophysiology idea and that is we were interested in variability in the QRS complex in the electrocardiogram. That's this little dizzle here that tells you how fast conduction is in the heart. There's reasons that we think we should look at that. So the first thing we did was we developed algorithms to find patients who had a normal electrocardiogram, no heart disease, normal electrolytes, no confounding drugs, really, really normal people and deployed it in the entire electronic record, not just the subset with DNA and found 30,000 people. Andrew Ramirez who is in this audience and who's now working on this campus directed that effort at our place and this is what the distribution of the QRS complexes look like. So these are entirely normal individuals and we're interested in why people are up at this end versus down at this end. So we did our genome wide association study supported by e-merge, supported by e-merge one and got no signal, then deployed that algorithm across the e-merge network, got lots more cases, lots more control, lots more cases because it's only case only study and this is what the Manhattan plot looks like and this is a signal in actually a pretty good candidate gene anyway. So this is the cardiac sodium channel locus that this is a cardiac sodium channel. This is a different sodium channel that people have gotten interested in because of this kind of work and that controls conduction in the heart. So it's all very well and good. So we then did another experiment to sort of validate this result and what we did was something called a phenome, we've called a phenome wide association. So GWAS, you take a phenotype and you say yes or no if it's a discrete phenotype and you look across 500,000 or a million or 14 million SNPs and do a test of association at each locus. What we did is we said let's take 13,000 people who have been genotyped at this particular SNP and say wild type or variants are reference or variant. I'm not supposed to say wild type. Reference or variant and do a test of association with every single diagnosis that we have in the electronic medical record. There are about 1,000 and we recognize as overlaps across those different phenotypes but we have ways of making this more and more sophisticated and this is what the Manhattan plot looks like for this particular SNP that happened to be the top one on the Manhattan plot that I just showed you. And what's interesting is these two dots here are arrhythmia diagnoses. So you sort of say, well, he's an arrhythmia guy and he started with an arrhythmia question so that's not a big deal. So at the very least we rescue the signal that we started with. But remember we started with people who had normal electrocardiograms. We didn't start with people who had arrhythmias. They get arrhythmias later because we have the electronic record that follows them for years and years and years. So what this says is that when you start out at one end of that distribution you're more likely to get an arrhythmia and here's a genomic predictor of that and then we were asked by the reviewers to look at it over time and there is a gene dose effect over time with the development of atrial fibrillation. And this is in a gene called SCN10A. SCN10A was originally cloned from the dorsal root ganglion and so the way it affects heart conduction has been pretty controversial. One of the other hats that I wear is we study those kinds of problems in my mouse and fish lab and so we actually looked at what happens to wild-type myocytes. These are action potentials from mouse myocytes at baseline and then when you put in a tiny, tiny, tiny concentration of a sodium channel opening toxin called ATX. And that's what happens in wild-type mice. We have generated sodium SCN10A knockout mice and we actually don't see that a rhythmogenic effect and in fact that's reproduced. So I throw this in just to make sure that people know that I still think about this kind of stuff every so often and also to make the point that there is this loop that has to be closed. Everybody has said we now have 3,000 more signals or 3,000 more loci to look at than we did 10 years ago and we better start to look at them because maybe this is a drug target, for example. So we've deployed the FIWAS algorithm across the entire GWAS catalog supported by NHGRIs. That's about 1,300 different tests of association. Some of them are with phenotypes that are not well captured in the electronic record. Like, do you, does your urine smell after you eat asparagus? Are you bald? Those are things that the electronic record doesn't capture very well. So we don't pay attention to those in our validation study. So here's an example of a highly platropic SNP. This is a SNP in IRF4 that determines skin color, but when you do the FIWAS, you get tremendously significant signals for various kinds of skin cancer as well as actinic keratosis. And it turns out that SNPs that are highly validated, highly replicated in the GWAS catalog replicate this way as well. And we have about 70 new associations using this approach to discover pleotropy. This is what Emerge 2 looks like the number, don't take the numbers excessively seriously, because for example, this one says 27,000, it's probably, like I said, more like 20,000, but we count immunochip and metabolichip in this as well. So there's 300,000, 330,000 or so people in Emerge 2, there are about 75,000 with dense genotypic data that we're actually putting together in a very large set. So this highlights for me the paradox of personalized medicine. As a clinician, I have one patient in front of me in the office, but what I need to do is be able to treat them differently from the average. And in order to do that, I have to have a very large data set to convince me that that different treatment is in fact justified. So that's the discovery piece, then the implementation piece. So we've been hearing all day about pharmacogenetics, that's the easy stuff, and that's probably the first thing that's gonna be implemented. I've been doing pharmacogenomics and pharmacogenetics my entire career and we're part of the Pharmacogenomics Research Network, another effort funded by NIH. Eric already alluded to the fact that there are now many, many drugs that have labels that include pharmacogenetic information. This is only for germline, the other half of the drugs that have labels are for the tumor germline or the tumor genome. So everybody says it's easy and I like to show this. It is low-hanging fruit, but it's not so simple. I'll just say that. So we were tasked by our leadership to come up with a way of starting to deliver pharmacogenomic information in a preemptive way into the electronic medical record in the fourth quarter of 2009. We were given a year to plan. And so this is what the planning looks like and I'm not gonna walk you through any of this. If anything, you should just read down these things to understand that there are multiple communities that you have to engage and excite in order to execute a project like this. This is what we call our Predict Project and this is what Predict stands for. So the notion is, in brief, find patients who are at high risk for getting a drug with one of those actionable pharmacogenetic stories, one of those 58 drugs. And then you genotype them not on the drug you think they're going to get, but on a bunch of different draw, on a multiplex platform that assays many different pharmacogenomic variants. And then you do what I call the easy stuff. You store the genomics, track the outcomes, provide informatics support to clinicians who are prescribing the drug at the appropriate time. So who is at high risk? Well, one group of people are at high risk are people who populate our internal medicine clinics. We did a study in about 50,000 people in the electronic medical record asking the question over the course of five years how many of them are exposed to one or more of those drugs that have FDA labels. The answer was a bit of a surprise. There are 65% of them that get at least one drug from that list over the course of five years and 15% that get 10%, that get, sorry. 15%, 15% get four or more drugs. So that's one group of high risk people and we are actually including them in the Predict Project. The other high risk group of people are people we can look at and say, within the next week or two, you're gonna get drug X. And one of the best example of that is the people going to the cath lab at Vanderbilt. We do about 4,000 catheterizations a year. About 1,800 of those patients end up on clopidogrel. And as we were planning this project, the FDA did us a favor, relabeled clopidogrel to include this statement, consider alternative treatment or treatment strategies and patients identified as CYP2C19 poor metabolizers. CYP2C19 is the enzyme that bioactivates the prodrug clopidogrel into its biologically active metabolites. And I have to say here that it was Grant Wilkinson, a faculty colleague of mine who in the mid 1980s discovered the fact that CYP2C19 was polymorphic. He was not studying clopidogrel and remember me and many other people gave him an incredibly hard time because he was studying this incredibly obscure drug and probably a pointless line of inquiry and in fact it turned out he was right that it was an important thing to study because now it's the centerpiece of much of what goes on in pharmacogenomic implementation. The other thing we did was we took our bio-view specimens, found a group of people who had gotten a stent after an acute coronary event, looked at 30 day outcomes and found 200 people with complications and 400 or 500 controls and replicated the known signal for CYP2C19 and its variant in terms of imposing risk. So over the course of the last two and a half years, we've now studied about 12,521 patients in predict. Bruce Korf on the video that you just saw described a program where you might genotype people and then deploy the information when it became apparent and that's exactly what we're doing here. There are 334 homozygots for CYP2C19 star two and 2,369 heterozygots and what's interesting is that most people don't have a common variant. We don't actually know how many people have a rare variant. We know that they don't have a common variant and just to show you that it did give you a sense of the fact that this is actually, although it's complicated, it's even more complicated than you think. It's not just star two that are the hypometabolizers. There's star three and star four and you can be star three, star four. There's also a star 17 that nobody knows what to do with. The heterozygots and homozygots you saw on the pie chart are actually multiple genotypes and how to translate from a genotype to a diplotype to a predicted phenotype is one of the challenges in the area. When a patient who has this information in their electronic medical record has an electronic prescription written for clopidogrel and are a poor metabolizer, this is the point of care decision support that pops up that suggests two alternative drugs and we track how many times physicians look at this, how many times they change their minds and we're just learning about responses to this kind of program. And I thought I should show this picture because again, it's about personalizing medicine. This is one of our interventional cardiologists and when we started the program in September, 2010, we were really eager to find the first star two, star two patient and this is she. So she is being taken care of by him and he knows a little bit more about her now and that's personalizing medicine but he's taking care of her for a long time so he knows a lot about her and her attitudes and her other diseases and her other medications so that's what's important and there's another quote from William Osler which in the interest of time, I think I'll let you read and now you've read it or not. So we've now deployed five drug gene pairs, clopidogrel, simpastatin, warfarin, thymopurines and tachrolemus and those are displayed on the electronic medical record. This is a screenshot of what our electronic medical record looks like, I've blacked out all the identifiers except for this one here because I wanna make sure people know that I still see patients and so these are the genetic variants that belong to this particular patient. These are a partial list of his medications that actually go down here and what's interesting to me is that he's been on warfarin for a long time so we actually didn't use the warfarin information to tailor his dose but he has a loss of functional allele in CYP2C9 and he does take a remarkably low dose of warfarin, only three milligrams a day so that's probably the explanation and had we known that at the beginning we would have started about three milligrams a day. We also have to say use this kind of technology to display variants in tumors in an effort very, very similar to but probably smaller than the one you just heard from Dr. Garroway in our personalized cancer initiative and that was a BRAF mutation. The other thing, after you've deployed five drug gene pairs you can start to ask the question how many people have variants in one or more of these pathways and what's interesting is that the number that don't have anything is now getting smaller and of course every time we deploy a new drug gene pair the number has to get smaller and one way to think, I mean it's obvious and it's almost trivial to sort of highlight it except to point out that as if you're gonna do multiplex testing what you find out is that everybody in this audience is abnormal for something and we all, if you live in genome land we're like we all do, we all recognize that but these are real data that speak to that and so when I speak to lay audiences, when I speak to non-genomics audiences it comes as a surprise to them sometimes that we're all abnormal for something you just don't know what it is so this multiplexed approach I think is really the only way to go and I think that many people in this room have thought that problem through and understand that. We also engage patients so that we have a website called My Health at Vanderbilt where you can go and look at pieces of your record you can make appointments with your doctor when you look at pieces of your record you can look at genes that affect my medicines and then you can see your report for the drugs that we've targeted those are all sort of works in progress and you can see we're sort of still figuring out exactly how to deliver that information we steal from, sorry, we adapt 23andMe information to think about this because they do a reasonable job of explaining this and we think that that's something that's gonna be very, very important to engage patients in all this. So we have Predict at Vanderbilt and I'm part of the PGRN we're also part of the Emerge Network and so I was having a conversation with Terry Monoglio who directs the genomic medicine initiatives at HGRI and she said well why don't we take the PGRN's next generation sequencing platform for all those important pharmacogenes that you guys are working on and stick them in Emerge in a Predict kind of algorithm so that was an idea born at the end of a long day and we actually are doing that right now and that project is underway so it takes advantage of the expertise and capabilities of two separate networks that I think ought to be closer aligned than they are and I'm working on that. So I want to just summarize by highlighting in words what the lessons are. So first of all I think that we have not finished discovering so those of us who focus on implementation are working on trying to deliver some of that to patients but that doesn't mean that we know everything we need to know. We need to know a lot more and you've heard that all day today. The low hanging fruit of pharmacogenomics is much more complicated than we think but I think learning the lessons that we're learning along the way in this space will make us smarter in terms of delivering genomic information in the course of healthcare. More generally some of the problems I've highlighted this business of rare variants is gonna be a real problem in pharmacogenomics and everywhere else and then there's the problem of ancestries which we're only beginning to think about now. This is team science, it's interdisciplinary and you have to engage lots and lots of people not just get their grudging approval but get their enthusiastic engagement. There are huge educational needs in every constituency that you can think about. The evidence changes even in pharmacogenomics where you sort of say well CYP2C19 does this and then a year later you think well maybe it does this in some other context. So you really have to be attuned to the fact that the advice you deliver today may not apply perfectly tomorrow. It goes without saying but I have to say it that an alumina run might be 99% accurate. I have no idea what that number really is but that's not good enough for a clinician because it has to be 100% accurate and the reason is, I'll just say it, for those of you who are clinicians you all understand me. I walk onto the wards and the nurse or the resident says to me that this patient has renal failure, their creatinine is eight today and I say well what does their electrocardiogram look like, how do they feel and they feel fine and their electrocardiogram is normal, I say well that's a lab mistake. Any clinician would tell you it must be a lab mistake. But if somebody says this is a person's poor metabolizer I have no context for that. So I have to be sure that the data I get is correct. I think this is only gonna happen in an electronic medical record environment. We're thinking about ways in which to deliver this kind of thing to people who have less advanced electronic medical systems than we do but I think that's gonna happen and the only way this really happens is with institutional will. So I'll just close by talking about the teams. I've acknowledged these teams up here. These are the individuals at Vanderbilt. I can't walk my way through all of them but there are geneticists, informaticists, ethicists, lab people, fellows, translational scientists and these three guys down here who are the institutional leadership. So thank you very much again for the opportunity to participate. So our next speaker before he sits down is David Botstein from Princeton and David is gonna tell us about the fruits of the genome sequences for society.