 So our next speaker is Dan Rodin who's the assistant vice chancellor for personalized medicine in the Department of Medicine at Vanderbilt University And he's going to talk about outcome data and links to electronic medical records. So so Thank you for the invitation This is my introductory slide that has the little Vanderbilt logo down there in the corner. Well, I've done that. I've managed to take it away I should get the real pointer That's the van der Beloga, but that's the oh that It was on the second ago. I'm gonna talk about emerge emerge comes in two flavors the electronic medical records and genomics network a creation of NHGRI we We've been part of it along with gale Since the beginning these the original five sites plus the sites that helped manage the resource and that the original idea was just to to try to understand whether The marriage of DNA repositories to electronic medical records Was useful in genome science and I say that at the broadest sense and and I what I want to do is walk you through some of the results Some would I think are the high points and learning points from the results and make the argument that That this is a robust kind of resource that should be considered for inclusion in any kind of sequencing in the future So one of the first things we did was each site got to got to choose a phenotype of its choice and then Identify cases and controls from the electronic medical record. It sounds very simple And one of the things we learned since 2008, which is when we started to do this is that is that it's it's not very simple This is the algorithm that was developed at Northwestern for type 2 diabetes cases I'm not gonna walk you through it just to show you that there's lots and lots of different arrows and and some of the Arrows say things like you know greater than two dates Is there an abnormal glucose or hemoglobin a1c? so things that have to be found and things that have to be found in time order and There's also a separate algorithm for Cases for controls so that that was that was case That was cases. This is controls and again a long series of algorithms Not just the not just the failure to mention the diagnosis, but somebody actually looking for the diagnosis and not finding it and So that's how we did diabetes And at the end of the day we develop algorithms deploy them in the electronic medical record until you find say a hundred cases and then some Real human being goes through those hundred cases and says yes or no And we develop a positive predictive value for the case definition or for the control definitions And and I put some of the high points on here. We generally do somewhere between a hundred and 200 Cases or controls for algorithm valid validation I Just show you this just to show you that the it works each one of the five sites had a phenotype of interest we proceeded to identify patients who had those phenotypes and And do genome-wide genotyping and genome-wide analysis and one one common thread was that each site Got their phenotypes got their genotypes did their analysis had nothing to show for it And then each site went to all the other sites and said oh by the way you genotype people for cataracts But some of those cataract patients must have type 2 diabetes Can you find them using using the algorithm we've developed and and the answer was in each case or in all in four Out of five cases We managed to find or replicate things that we had expected to find before so this is a replication of the TCF 702 Hit in in diabetes. I don't know my my time my my slides must be on a timer, but I'll just leave it at that so one of the things that we then did was we said well Let's see if we can find do some do some work with phenotypes across the entire network We settled for reasons that I can't even remember on the phenotype of hypothyroidism And we developed algorithms again for cases and controls these are shown here And the idea was we would do them Deploy the algorithms and then Terry was going to pay for extra genotyping for us to be able to do this This genome-wide association study. We actually deployed the algorithms and found that we had enough cases and controls Without having to do any extra genotyping so we spent the money on something else, but So so in this particular case the the phenotypes were validated at each site with a positive predictive value at each site notice The positive predictive values are not perfect So at Mayo for reasons that we still don't understand the positive predictive value for the case definition is lower than at the other at the other sites, but the The the case and control positive predictive values are pretty acceptable overall I need to mention that all these phenotype definitions are posted and so your informatics guys can look at this website fkb and and go and Try to replicate those to find cases and controls in your electronic medical record and and I also should say that the of the five Sites of the five original sites. They're at least three different electronic medical record systems. There's epic There's a homegrown system, and then there's I can't remember what else, but they're at least three different sites So this is what the hypothyroidism genome-wide association study looks like there's a Linkage there's a there's a peak That replicates in a separate set the closest Gene is Fox e1, and I just want you to remember this RS number because it'll come back in one second and Fox e1 turns out to be a transcription factor that has been implicated in thyroid cancer And so we think it's probably real and some endocrinologist is going to have a field day with that I suppose So these are the phase one sites and the phase one phenotypes everybody got to do one Phenotype that they designated in their original application. Everybody got to do secondary phenotypes, and then there were network-wide efforts Mentioned the hypothyroidism one. I want to say something about phenome-wide association study FIWAS, which was actually our secondary Phenotype phenotypes the wrong word so before I say that I wanted to say a word about hemochromatosis I asked Gail if she was going to talk about this and she wasn't so I decided to there's a paper in the New England Journal About five four or five years ago looking at The frequency of C282 y in HFE in northern Europeans And it's a it's a little under 1% and and most of those Most of the males who carry who are homozygous Don't have a phenotype and very few of the women Have a phenotype and the idea was that this is highly non-penetrant And so you probably don't need to reclude this in routine testing one of the questions We've been asking in emerge is suppose you had the genotypic data anyway What would you do with it? Which is a different question than going ahead and getting the genotypes anyway? So we looked in bio view. That's happens to be our data set Gail is organizing the effort around all of emerge and Out of 5,000 people who have genome-wide Genotyping that particular SNP happens to be on the platform we use So there's about a 1% incidence of homozygous and the interesting thing is that some of them carry the diagnosis Many of them don't carry the diagnosis and probably don't have the diagnosis But seven of them are receiving iron which is the absolutely wrong treatment And so our contention is that that despite these data if you happen to know and Questions, how do you happen to know but in a genome-centric world? You might happen to know what would you do with the data? So we think that we can see we can certainly envision a day soon where people with this particular genotype Their physicians will get little notices saying, you know, by the way, don't use iron or by the way look for X Something like that Now the fee was is is this is this approach that we've been hearing about for the last 16 hours or so of? Going from genotype back to phenotype so you can Ask the question in a group of people who have Genotypic data who have been genotyped at a particular variant or across the genome with what phenotype does that particular genetic variant that you're interested in associate this happens to be the snip for in the foxy one region and When Josh Denny did a fee was the the phenotypes here our ICD-9 code So we would like very much to refine that phenotypic definition, but when that was done We replicated the hypothyroidism signal like gangbusters You don't have to there is a their penalty for multiple looks Daniel But it's it's not like we're looking at 500,000 snips. We're only looking at a thousand ICD-9 code. So this is a pretty nice robust signal There's lots of other thyroid diagnosis graves disease is not one of them So we can say that there's no association with graves disease I like the idea that there's an association with atrial arrhythmias as well and an association with an abnormal set of lab Values so so we think this is a way of looking at Pleotropy or looking at Genotypes phenotypes associated with specific genotypes. This is another example This is happens to be a snip that's associated with skin color and when we do the fee was There's strong signals for skin cancer and strong signals for other skin diseases so we think that that's pretty important and and We'll be a continuing part of the focus in the electronic medical record as we go forward with rarer and rarer variants, I think so in 2010 2011 something like that Emerge expanded in to include two more and if you were looking at the slide actually three more Adult sites and we've now expanded to include three more sites that focus on pediatrics So it's a much larger data set in the interest of time I'm not going to read you all the things that we're focused on We're not just focused on phenotype genotype associations, but lots of interest in Finding new associations lots of interest in this question of action ability And I'll close with a little discussion about that and lots of interest in the regulatory consent privacy Clea cap kind of issues This is what the data set looks like as of this morning This number keeps on getting bigger and bigger every time gale gets on the gets on the internet But it's somewhere over 300,000 subjects and and when I say genotype genotype on a platform What our definition of genotype here is a platform that allows you to impute so something dense a GWAS platform or perhaps Metabol chip or immunochip So that's a pretty dense data set I love this So I want to talk a little bit about implementation because that's one of the things we thought about in the merge and It emerged to this is a cartoon that came out when the first draft of the human genome was announced This is what Dr. Collins said in the New England Journal of Medicine when he was when he came NIH director He he said you know everyone's DNA sequence is already in their medical record and it's simply a click of the mouse I have highlight the word simply because it's many things. That's not one of them And and it should improve outcomes and reduce adverse events. So we we all buy into that vision The FDA has bought into that vision. There are 58 drugs that have FDA that have in their FDA labels some mention of variant Responses due to pharmacogenomic known pharmacogenomic variants one of the poster children for this effort is clopidogrel or plavix and The FDA actually included a black box warning in on for clopidogrel in 2010 that includes this sentence Consider alternative treatment or treatment strategies and patients identified as CYP2C19 poor metabolizers So the CYP2C19 poor metabolizers are a group of people who have one or two copies of a variant allele called star 2 Those incidence data are from a project that we're running at Vanderbilt right now It's about the denominator is about 7,000 Vanderbilt patients. So so those are pretty accurate numbers But if you go to the exome variant server that Gail already introduced you to and look at CYP2C19 It's a little bit more than star 2. There are 67 missense or nonsense mutations and a third of them have never been seen before So we think that as we think about implement implementing we can't just implement for the common variants That might be the first baby step, but the next step will be to implement for common and rare So I put this on the background of the slide that Eric already showed you we have a proposal into NHGRI to run a project called immerge PGX pharmacogenetics to start to think about how would you we would use sequence Pharmacogenomic data in the electronic medical record environment It's an alliance with the pharmacogenetics research network that has a lot of efforts that involve Thinking about what variants are important and actionable How you might go about putting them into an electronic medical record and how you might go about building a platform that would Interrogate pharmacogenetic very important genes and this is an effort that Debbie Nickerson is leading For front for PGR and of course emerge I've already told you about this part and this part here is something that the Informatics teams at all the emerge sites are very very interested in so so it's a sort of marriage of convenience But a convenient marriage of convenience This is the there are three aims and I'm not going to walk you through them just to say that we're going to Find patients we're going to re-sequence them and then when they're actionable We're going to deposit things at the electronic medical record and do stuff around them So we will tell physicians your patient has a star 2 variant in situ C 19 You're about to prescribe plavix think about doing something else and then we're going to find lots of other things We're only interrogating 84 genes on this platform right now But we're going to find lots of other things and we're going to put those into a repository and scratch our heads about them So what we've I feel like emerge has been part of my life for ever and ever and ever it turns out It's probably only four years, but what we've learned is Is that you can find cases and controls? The complex phenotypes so not just disease but disease drug outcome of drug therapy or disease complication of disease response to drug therapy second-to-last slide Those are harder, but we're working on those right now The the the going from the genome back to the phenome is is absolutely feasible And and I think that's it going to be a gold mine And it's a gold mine for the electronic medical record because these are people who have been Acertained because they come to a health care system so you can say well They haven't been interrogated for every possible phenotype But they've been interrogated for phenotypes that they or their doctors think for some reason are important So so this is an interesting approach we think and the implementation is can be done But it's really really complicated and there's a great example of the devil being in that in all the details So Terry had a series of ten questions or 20 questions or seven questions, and these are some of the answers I many of them don't apply to this particular thought process so the so I think that I would make an argument that that whatever population we decide on Focusing for sequencing efforts shouldn't should have as part of their phenotypic repertoire access to Sophisticated electronic medical records there are big advantages of mining in the electronic records there Those are those are real patients with real diseases so that that's one thing we think there there's feasibility demonstrated If you find things and you're worried about how to find cases and controls at least working in the electronic medical record environment allows You to start to think about how to implement The rare and extreme phenotypes. I think should be accessible. We haven't tested that formally and What I what I mean by potential for coupling to other data sets We're continually being asked about tissue Continually being asked about serum plasma other sorts of data sets that you could envision interrogating in a large scale then integrating with Omic sets and and the disadvantage is that the phenotype in the electronic medical record is what's in the electronic medical record and People sometimes ask me well, can't you get x or can't you get y and One way to do that is to make the electronic medical record better Another way to do that is recontact and another way to do it is to say well for that particular Phenotype you need a different approach if you want to study cystic fibrosis in gruesome detail The electronic medical record may not be your best friend It might be a friend, but it's not your best friend And you have to go through the things that that Mike has described for us. Those are the EMR thoughts for the day Thank you discussion Actually one of just a quick announcement that we did manage to get you coffee for the afternoon I realized we do have some we do have some European time folks here So there is some out there, but don't leave until you've you've been part of the discussion So Dan could you could you comment a bit on on the issue that the phenotypes are the things that are important to the patient and The clinician I mean it seems as though that's an important point that a lot of times we we miss and yet We've always I think in in Traditional epidemiology sort of said well, you know That's only the ones that are picked up by the clinician has to be astute enough to notice them And we may be missing a lot No question. I mean it's When people ask me questions about what it is and that that's good about the electronic medical record I say it's it's it's what happens to people when they're actually encountering the health care system And if if that's if we want to sort of use genomics to change to bend some kind of curve I don't know which curve we're bending to to to make outcomes better than Whatever is it whatever it is that people are going to see doctors for whatever it is doctors are diagnosing is the starting position Do doctors make mistakes and diagnoses? Yes, do they do they write things down in the electronic record? Incorrectly so that the algorithms don't work perfectly or so that we have trouble finding things absolutely Do they misdiagnose yes, do they misdiagnose or miscode on purpose occasionally So there's all those all those warts, but I think like many other discussions if you have 300,000 patients you can you have perfect as the enemy of good and and nobody expects perfection from this resource to start with so this is So I think that numbers are important and we can get numbers out of this Yeah Yeah It's wonderful to it's very exciting work and a terrific use of the resource and and I love the idea of the phenotype scan The one thing that we don't have if we just go to EMRs is time time passing so Time passing I'm sorry one thing. We don't have is time passing So if we want to know what really was the situation 10 years earlier By and large we don't have it, but you must have it sometime. So how frequently are you able to get back? to What you might call the source population? Went scum at the people in front of you so that you could do let's say your analogy To what Julie described so she looks now at people who have cancer of X She opens the freezer with what they put in 20 years ago What's the EMR equivalent and how frequently are you able to do it? So I can tell answer that for our own site Our electronic medical records started in 1991 it started to get populated in about the mid 90s and And since about 2000 When I go to my weekly clinic I don't hold a manila folder in my hand. I haven't held a manila folder in my hand for over a decade It doesn't sound it's not it's not 50 years worth of experience obviously But it turns out that when you're taking care of patients You don't you don't you don't you rarely go back more than a year or two or three So so when somebody's sample arrives in our DNA Bank And I'm not going to go through the details of how our bank is organized But when we get a sample it's attached to their electronic record and that goes back to whatever it is They entered the system we have about a hundred and forty five thousand samples right now Our guess is and it's it's more than a guess But it's it's less than perfection is that somewhere around half of those patients are patients who make their medical home at our Place we're a tertiary care facility So there are people who touch the system and then go somewhere else and then there are people who touch the system and Those are the people we're most interested in because they have the dense electronic records And they have the multiple outcomes and the multiple diseases over time So I think that many of these resources are being built and over time They will be richer and richer sources of phenotypes Thank you and Gail and then back here We had a hand so if the group health data the electronic medical records actually go back to the 70s Where the lab data ruled in in the pharmacy data shortly after and there's decades of data And our subjects happen to be age 50 and up so we have long-term data on them at group health They also get all their prescriptions there for free. So not only do we know what was prescribed to them? We actually know if they're using it and picking it up And so we recently did an analysis of white blood count where I think we had on the average of 20 values Per individual over time and then we were able to analyze the median and actually analyze the longitudinal trend as well So it's you know for for an HMO in particular where you have that medication Data on top of the quantitative data can be very rich. Yes, we're actually NIH is funding studying the Kaiser Permanente cohort in northern California And we have a hundred thousand people and data that goes back 20 years on them with repeated measures similar analysis could be done So everybody's design is different so there are now I guess 10 sites or 11 sites depending on how you count them In a merge ours is an opt-out model controversial I don't want to waste a lot of time talking about that Gail had to go back and re-consent some of her Patients because they had samples, but they hadn't been consented the right way and There's a project at one of the sites that is doing opt-in at the time of clinic registration And they they guess that it takes something like an extra five minutes per encounter to recruit people and I I think that that's I would love to know if that's you know what how deeply the consent Process is explained, but so so we have all kinds of different models within the system. I'll just say that Okay, there were I saw I think one or two hands in it. Yeah, uh-huh I just wanted to plead guilty to overestating the dangers of the multiple testing problem So I think I think this the issue will loom larger as we start collecting more longitudinal Omex data Then there will be many more phenotypic data points to analyze But of course Dan is right in the context of being able to look at a single step over many different phenotypes It's the burden there is not strong and you showed some great examples of how elegantly that can be used Well, I showed the best example. Oh, so those are the most typical examples. They they the The vision of course is to create a phenome. That's not a thousand entries, but 100,000 entries very very precise micro phenotypes. We call them and and then we will get into the problem of Multiple comparisons, but it's sort of it's sort of like a GWAS where if you get a single snip that's up there you pay less attention to a bunch of linked snips That that make a signal so a bunch of linked phenotypes that would make a signal is more compelling like the Hypothyroidism example than a single one that happens to be hanging out up there That's a it's a perfect analogy the linkage works very well in that case Are there comments questions discussion? Yeah, Eric and then whenever I Hear the presentation on emerge. It's very impressive. One of the things I worry about is these are typically rarified environments Is this going to be transportable out to? The larger population at large and I won't quote you the number of uninsured and poor and those without access to healthcare in a State like Texas, but it's huge so The person on your left Has a has driven an RFA process that whose goal is to do exactly that to take sites That are doing this kind of work not not just the kind of not just the emerge kind of work But genomic medicine broadly defined. I guess I should say the two people to your left And and try to to embed those kinds of advanced technologies advanced thinking into environments that are really resource poor And and so so we'll see how that plays out But that's something that's high on Terry and Eric's wish list and I think that's an obvious I wouldn't say an obvious next step, but I mean if you're gonna Push this nationwide one of the things we've learned and emerge I think is that what works at Vanderbilt actually does work at other Academically-minded places with a lot of informatics support and the question is whether we can not recreate that informatics support But actually recreate at least the the outcomes of those tools to embed in other places And so yes, we're trying thinking very hard about how to do that I mean so Vanderbilt must have a Cadre of outcomes researchers comparative effectiveness researchers and so forth thought are they heavy users of this resource independently of Genomic data that it contains The resource is an interesting Interesting mix there the we've created a de-identified version of the electronic medical record by itself that that can be used as a resource for outcomes for example independent of Genomics so you can actually do work not in a in a in a resource that has a denominator of about two million as opposed to 140,000 or whatever we wish we had more outcomes people actually The major the major users tend to be translational scientists basic scientists No, somebody who's studied gene X for the last 20 years and now wants to know if they're human phenotypes Which I find really an interesting and compelling use of the resource and then and then people in the in the Center for Human Genetics Research who are obviously big big research big big users We are looking at outcomes particularly with respect to this project of embedding genotypes in the like the actual electronic medical record and getting physicians to act on them and That's gonna that's a big long-term I can't say I can't say more than that. We need more people to do informatics and outcomes like everyone else Okay, and one last comment Dan is there any effort to link? Families through the electronic medical record to study phenomes across you know Certainly at our place. I mean the only the only linkage to families that I know about is is when You know, we do GWAS and then look at Relatedness across those sets, but we're looking to get rid of that. We're not looking to incorporate that and I but I think that if if If you want to use electronic medical records for the kinds of family studies that That we've heard about today and last night. I think that's a recontact issue in general All right, thank you Okay, great. Thanks. So now we're gonna move on