 And I thank Terry and Jeff for the opportunity, for being the spokesperson for Emerge. I'm also one of the Emerge PIs, I guess I should have introduced myself that way. Fourteen minutes, all right, where's my, so this is what I'm going to talk about. I gave myself this title here, so this is the way the Vanderbilt medical record room looked like 20 years ago, and I think we're going to deliver genomic medicine. It's almost inconceivable that we can do it in a paper environment. So although electronic medical records are controversial, including a story in today's New York Times, I really deeply believe that creating electronic medical records helps manage individual patient care and is enabling for genomic medicine, both in the discovery realm as well as in the implementation realm. Eric showed you these slides. So what I'm going to talk about is Emerge, and Emerge has two domains. One is the listed here, and the second is this more implementation side. I'm going to talk a little bit about the discovery side, and then a little bit more about the implementation side of Emerge. Emerge was created in the spring of 2007, and the first phase had five centers that are listed here, and our mission really was to assess the utility of DNA collections coupled to electronic medical records. And the carrot for us to participate was that each site got to identify a phenotype of interest in around 3,000 subjects and conduct a GWAS. We also had a major focus on privacy and de-identification, which I'm not going to talk about, a major focus on ethical, legal, and social issues, which I'm also not going to touch on a little bit later. So one of the first things that each site did was identify their subjects, do their genotyping, do a GWAS, and discover that they had no signal. And they had no signal because they didn't have enough patients. So one of the things that we did was we went through a very elaborate and laborious process in order to share data. And I think that this is an important lesson for all of us who are trying to do genome science in the discovery or in the implementation phases. This is what the data-sharing memorandum of understanding, the elements that are listed. So each site managed to go to other sites, get other samples that had been genotyped for cataract, but were useful for diabetes, and expand their sample sets, publish a genome-wide association study. We also managed to create data sets across the network that are now quite large. The way we go about phenotyping, I thought I'd show this very briefly, is that we develop a phenotype, identify a phenotype, we develop algorithms, and manually assess them. In a couple of hundred patients, if the positive predictive value is okay, we say, fine, we have an algorithm. If it's not, we iteratively fix the algorithm. Once the algorithm is done, we deploy at the site that developed the algorithm and deploy at other sites and then do the genome-wide association study. The important part of showing this is that the algorithms that we develop at one site that have to be identified or validated at other sites, we've done that. And this is one of the studies that we did in hypothyroidism. And you can see that the case and the control positive predictive values are really pretty acceptable. I really don't know what was going on at Mayo, but their PPV was a little lower than the rest. But this is an algorithm developed at one site, implemented at others. And I think that's an important lesson as well. So that resulted in a GWAS with a signal. And this is a GWAS that was done with no extra genotyping. Each one of these samples had been genotyped for some other indication to start with. So we had readily available data sets that we could use for this application. It turns out the signal is next to FoxE1, which is a transcription factor that's been implicated in hypothyroidism or thyroid cancer. So this made biological sense and it was replicated and published. Now, I want to take a detour to tell you about a technology that we and EMERGE have been developing that I think is uniquely suited to electronic medical records. So GWAS starts with a target phenotype and then displays the data in the way that you're all familiar with. The phenome-wide association study turns the algorithm on its head, says, we're going to start with a target genotype. Suppose the genotype is that SNP associated with hypothyroidism and then ask the question, which across electronic medical records with multiple diagnoses, with what phenotype does that SNP associate? If the reference or are they variant, what does the chi-square look like? And we display the data in exactly the same way. So the requirement to do that experiment is that you have to have a large number of patients with many diagnoses and lots of genotypes. So that's uniquely situated or uniquely suited to the EMERGE environment. So this is what the GWAS looks like for the FoxE1 SNP. And we actually replicated the hypothyroidism signal and also identified other signals related to thyroiditis, related to pernicious anemia, which may be clinically related, and related to atrial arrhythmias, which are also part of the spectrum. So that's an interesting validation. We've done that now for other SNPs. This is one of our best examples. IRF-4 SNP that's associated with hair and eye color, clearly associated with skin cancers and other kinds of skin diseases. And we've now done this for all the SNPs in the GWAS catalog that Francis showed you. And it provides a replication set. It also identifies potential new associations. And all the data are publicly displayed because that's the philosophy that we have. We generate the data. And then you can go to this website and look up your favorite gene, your favorite SNP, your favorite odds ratio, and find what we have found. And along the same lines, the phenotypes that we identify are also publicly displayed in a knowledge base called the Phenom Knowledge Base, the algorithms that are all there for anyone to try to use. So EMERGE then moved into EMERGE Phase 2 in 2011. We expanded the number of sites to include Geisinger and Mount Sinai, and then expanded to pediatric centers that are shown here. And our mission really was to expand the electronic phenotyping efforts, but also to start implementation at site-specific projects and across the network and then other kinds of initiatives, which I'm not going to discuss in detail. So one of the things that is going on right now, following up on the common rule issue that Dr. Collins alluded to, is this issue around consent for biobanking. And we are now developing questionnaires and developing evidence sets that will enable a very, very large survey, 100,000 patients across EMERGE institutions to identify policy issues, really, around biobanking with input from the actual consumers. So that's an important network-wide initiative. Another example of a network-wide initiative is in the area of hemochromatosis. It turns out this is a relatively common variant, and the homozygotes are common. These are the variants that cause hemochromatosis. And we have a total of about 1,500 of those across the EMERGE sites now. And the questions we're asking are, how many of these patients actually carry the clinical diagnosis? And among the ones who don't carry the clinical diagnosis, do they have phenotypes that make it so that they should carry the clinical diagnosis? We actually have examples at our own center of patients with this diagnosis who have been transfused, which is really the wrong treatment. So we're interested in knowing whether this should be implemented in electronic medical records more generally. Each, there are site-specific genomic medicine implementation projects. And I've divided them up into sort of the broad focus areas. One is developing genetic risk scores and evaluating the potential clinical impact. Another set of projects is to genotype specific variants and evaluate the impact on physicians and patients of returning those results. And you can see the specific projects listed here. The Geisinger group has a whole genome sequencing effort for undiagnosed and mainly neurological disease. And then there's a number of sites that have a pharmacogenomics focus, including a focus on hepatitis C, a focus on CYP2D6 genotyping, which turns out to be the most important enzyme for drug disposition, and also one of the most problematic in terms of just assay issues, a project around asthma. And then at Vanderbilt, we've had for a couple of years a large, multiplex preemptive pharmacogenetic testing project that got some support through this mechanism, and that morphed into a project that I'm going to talk about right now. So we at Vanderbilt are part of both eMERGE and the Pharmacogenomics Research Network, another large NIGMS funded network. And so this project that I'm going to outline for you is really a collaboration between the two networks. We call it eMERGE-PGX. So the things that the Pharmacogenetics Research Network brings to the table are an effort led by Mary Relling, who's in the audience, around the issue of, if you have genetic data available to you, what should you do with it? We don't, in CPIC, the Clinical Pharmacogenetics Implementation Consortium, we don't ask the question, should people be genotyped? What we ask is, if you have the genotype information already, because someone has had whole genome sequencing or exome sequencing or targeted genotyping, what should you do with that information? Those are published guidelines, and there's a very active working group around that. The Pharmacogenetics Research Network also developed a re-sequencing platform for 84, what we call very important pharmacogenes, including CYP2D6 and other CYPs, transporters, and drug-target genes. And then we've spent a lot of time thinking about how to implement those in a clinical environment using clear and acute quality control standards. That's what PGRN has focused on. Obviously, the Electronic Medical Records and Genomics Network Emerge has focused on issues around privacy, around electronic phenotyping, around generating large populations, and the issues around decision support. So in the Emerge PGX project, we have three aims. The first aim is to identify patients who are going to deploy this PGRN platform in. We also have an aim of developing a list of actionable variants. And as those of you who have thought about this problem will recognize there, that list is pretty small. Might be less than a dozen. And then re-sequence those VIP, 84 VIP genes, identify the actionable variants. The target is to do this in 9,000 subjects across the network. And the things that we're focused on, mostly in the network, are CYP2C19 and clopidogrel. Warfarin is a focus, although there's recent data that make people a little nervous about implementing warfarin in a routine clinical environment. There's a Simvastatin implementation project. And then there are others that I'll mention if somebody asks. To identify target patients, there are two ways to do it. One way is to recruit new patients. Another way is to look at patients who are already in the biobank. But we're basically using predictive algorithms that say what is the likelihood that this patient over the next three years or five years will be exposed to a statin drug or will be exposed to an anti-platelet or an anti-coagulant drug. And there are informatics approaches that allow us to identify so-called high-risk patients and engage them in this EmergePGX project. So there's newly recruited subjects as well as existing subjects within the biobank. The second name then is to take the ones who have actionable variants, deposit them in the electronic medical record, display the data, deploy decision support, and then track outcomes. And I'll tell you what the outcomes are in a moment, but basically there are two kinds of outcomes. Number one is what we call process outcomes and that is how well does this work? How good is the genotyping? How many times does the decision support fire? And then the other outcome, of course, is how does that affect people's healthcare? And then the third aim is to create a repository for all the other variants that we're gonna discover, common and rare, that are not clearly linked to a variable drug response. And that repository is in the process of being created and has a name and has a logo and that's the most important first step to developing the utility. So it's Sphinx, and I wish I could remember what Sphinx stands for, but I'm pretty sure the P stands for pharmacogenetics. So the outcomes, I'm not gonna list them, but again, there are process outcomes. How many patients do you recruit? What does the sequencing look like? Most centers are actually validating the results of sequencing of an orthogonal platform and what does that look like and the others that are listed here. And then the healthcare outcomes, we're thinking about these kinds of outcomes with respect to specific drugs. As well, among the 84 genes, there are six that have a special designation from the American College of Medical Genetics as quote actionable unquote. So if we find a variant in one of those genes, there is a thought that we should be delivering those results to patients and physicians and we're gonna have a project around how to do that and what kind of variants we find. So this is, I think, my last slide. This is what Emerge looks like right now. And these are the numbers. And the total is about 350,000 subjects. There's genome-wide association data at each of these sites and there is an imputed set, and this is a huge project that goes on in the genomics working group of about 51,000 imputed genomes across these sites. So we think this is gonna be an incredibly rich resource for mining for discovery. And I thought I would close with a personal opinion and that is that if we're going to personalize medicine, if we're going to treat an individual patient differently from some mean approach, then we have to have evidence. And in order to have evidence, you have to have very large numbers. So I call this the paradox of personalized medicine. If you have 100 patients and you wanna treat one differently, there's no evidence that you could ever develop. If you have 100,000 patients and you wanna treat 1,000 of them differently, then you can start to develop evidence. So in order to personalize medicine, in order to individualize therapy for individual patients, we have to have very large data sets to draw from. And one of the lessons I think we're learning across Emerge is that you have to have very large data sets from multiple ancestries. And we have some ancestries represented in Emerge, but not all. So one of the really great appeals of this particular meeting for me is the notion of creating these kinds of resources worldwide in order to understand genetic diversity and in order to personalize medicine across the world. Thank you very much.