 Okay. So the eMERGE program had a workshop in October of 2017 to review the current goals and accomplishments of eMERGE and to suggest future directions for possible continuation of the eMERGE program. Sharon will tell us about the recommendations of the group. Sharon? Good afternoon. I'm really representing the organizing committee here in trying to summarize the results of this workshop. I just want to start it off by reminding everyone that the workshop was video cast and it's all available, the workshop summary and the video cast are available for anyone who wants to see any of the details of the presentations that were made and there was a long report and an executive summary that were produced as well. So the goals, there were really two goals of the workshop. The first was to review the current goals and accomplishments of eMERGE and I will say right off the bat that I'm really only going to discuss those briefly because I was mainly asked to focus on what did we identify as concepts and gaps in the field of genomic medicine that a potential new phase of eMERGE could address. So just a brief overview, eMERGE is, Consortium was started in 2007 with four years of funding in the first phase and was then renewed for two subsequent phases. Each phase has focused on expanding questions related to the use of electronic medical records and as you can see there was a significant increase in genomic sequencing technology over that time and the analyses starting with a fairly limited number of genotypes and most recently sequencing of a multi-gene panel. And of course there was LC research incorporated throughout. So the most current phase which I'll just refer to as phase three had three main goals that are provided here. The first goal was to sequence and assess clinically relevant genes in about 25,000 individuals and I'll talk about the genes in a minute. The second goal was then to assess the phenotypic implications of variants that were identified in these genes and then of course the third and I will say throughout eMERGE, the eMERGE part of the acronym is for their very heavy use and research into the use of electronic health records for clinical care and of course the overall goal is to create community resources that can be used in a variety of different consortia. So just briefly this is the map view of the eMERGE network in its third phase of funding and you can see that there are a number of institutions throughout. A slightly more detailed view can be found here and this just really summarizes the different components of the eMERGE network. Currently there are 10 project sites so those are the groups that are actually carrying out specific projects related to patient cohorts. There are two sequencing centers, one through Partners Broad and the other Baylor College of Medicine and then there are two coordinating centers each of which has a slightly different function at the University of Washington and Vanderbilt. And similar to other NHGRI consortia, much of the work of the consortia is done through six eMERGE subgroups that I won't go into detail here but have been very active and this organization often has met three times a year to keep all of the work coordinated. So just briefly the two, the three aspects of eMERGE three that I just want to make clear because this will come up in the comments about the future. The first is what was the actual platform and so the eMERGE phase three sequencing platform included a total of 109 genes including the American College of Medical Genetics and Genomic Secondary Actionable Finding Genes as well as genes that were nominated by the individual sites. In addition there were some specific SNPs that again were nominated by individual sites including some that were related to HLA typing. So there were 109 genes that were sequenced as well as 1,500 SNVs that were analyzed although not necessarily reported back for each subject. The process of return however was more heterogeneous and this is just to try to show that there were two main pathways. Obviously the results were returned to the sites, the sequencing results they were reviewed in each case by a committee. Seven of the nine sites then contacted the participant about the finding of a potential result they were returned by a genetic counselor then uploaded to electronic health record and then the primary care physician was informed. There were another two sites that uploaded the data to the electronic health record, sorry that got duplicated there, duplication event, the participants were contacted and then the results were returned by a genetic counselor or a specialist. And there were also minor, my understanding is there were some minor variations so none of the nine sites returned results in exactly the same process and they also varied somewhat in the specific results that were returned up other than the ACMG 56. Another critical aspect of eMERGE which again will come up in our discussion about the future is the very extensive work that's been done to derive consistent and reproducible phenotypes from the electronic health record. And to accomplish this goal, sorry I must be on timer, to accomplish this goal a number of different tools and knowledge bases were developed as described in this elegant figure and including LMAP and FIKB which is a collaborative environment that was developed for building and validating electronic phenotype algorithms. So as someone not in the field I didn't know really what this meant for a while and so just for others who might be listening it's really how do you know that a patient with type 2 diabetes actually has type 2 diabetes, it might be on the problem list, it might be in a physician note, it might relate to a medication that was ordered and so these are very detailed methods for you to be able to say consistently across an electronic health record how do we consistently say that someone has that diagnosis. And this is just a summary of the phenotypes that have over the three phases of eMERGE funding that have been developed and so at this point they expect by the summer of 2018 there will be 70 such disorders for which an electronic phenotype is available to be used in a research setting. Okay, so with that introduction to and that's really only a brief introduction to eMERGE, I want to just move on to the workshop. The workshop was held on October 30th, 2017. We had an introduction to the eMERGE consortium early in the morning in their accomplishments and then the workshop really focused on making recommendations with regard to these four major topics, the first being electronic phenotyping for genomic research, evidence generation for genomic medicine, EMR integration of these results and automated decision support and then novel and disruptive opportunities in genomic medicine. So I'm just going to go through the recommendations of each of the four subgroups. So with regard to electronic phenotyping, the key recommendations for this group really focus on developing better methods that take into account the continuum of disease severity and longitudinal phenotyping. So currently you either have type 2 diabetes or you don't. The phenotypes don't currently allow you to say this person has a severe case of it or if you have a disorder that may have resolved and so there was a need to take into account this kind of longitudinal phenotyping of the patient. A second major recommendation was to improve the speed and efficiency of developing and implementing the phenotyping strategies as well as what is a relatively manual validation process that's currently been done to determine the adequacy or the validity of the method. And then finally there was, even once these phenotypes were available, there was recommendations among the group to try to increase the efficiency that they could then be deployed across the sites in the consortium as well as sites outside the consortium to increase the pool of data that might be analyzed. With regard to evidence generation for genomic medicine, really the topics here focus on ways of improving, ways of improving the generation of this evidence. And a key recommendation was really to reflect on the prior three phases and document by the end of the current phase what the eMERGE investigators really consider best practices for genomic medicine and that those practices then would be brought forward into any subsequent funding. There was a lot of discussion about thoughtfully trying to balance standardization across the sites while allowing some, while allowing for innovation at the individual sites to really maximize the amount of data that could be compared across sites. And I think this is an issue that, you know, many of us in the consortium always deal with how much do we work on individual projects versus having standardization across the projects so that we can share data effectively. There were several reports, several points that were made by the group related to return of results particularly including trying to increase automation to clinicians, how clinicians would receive the results from this kind of consortium. And also there had not been much work done yet in the eMERGE phases about returning negative results but obviously the majority of genetic testing we do of any sort is negative from the classic kind of sense of a positive result. And eMERGE, future eMERGE consortia should think about also what do we learn when we return negative results to patients. And then finally just really a pragmatic issue is trying to front load the sequencing as much as possible so that you have a long enough time of follow up to really be able to assess outcomes of any sequencing intervention. So there was just a feeling that the study designs needed to take that into account. There was, and this gets to slides because the electronic health record is such a critical issue in eMERGE, there were many recommendations with regard to the strength of eMERGE is the integration of the genomic results with the potential for automated decision support. And so there were a number of recommendations for future studies to encourage the development of automated decision support with a potential focus on a few key topics. So obviously further automation of variant classification to speed the ability to develop these results, develop tools to improve visualization of results by physicians. Most genetic test reports are not really readable by the vast majority of physicians who receive them, develop user-centered designs particularly for high priority topics so if a genetic result is associated with a key clinical decision or drug prescription that we have efficient clinical decision support really focused on those high priority topics as opposed to genetic results that might not have a clear guideline associated with them. Of course again the idea of trying to make these decision supports as shareable as possible across centers and across electronic health records, that's a major challenge with the different systems, even the same commercial system may be deployed in many different ways at different hospitals and so trying to make sure that the decision supports being developed can actually be used across multiple hospitals and then this idea of actually having the decision support keep track of how often it actually results in an action so you have this idea of kind of a closed loop. So you're actually deploying the decision support, getting results back to see whether it's actually working at the same time without actually having to do a separate survey or analysis of the data. The other focus on decision, in addition to the focus on decision support and physician behavior there were also a lot of, there were a number of recommendations to determine which patient specific factors might influence the utility of decision support, clinical decision support and so, you know, are there certain types of patients where the decision support is really likely to increase their health care or their wellness as opposed to other patients and that was really something that had not necessarily been, it is really more of a major focus given the emphasis on diversity in the NHGRI grant portfolio moving forward. Also the idea of developing road maps, you know, many of us, I'm considering myself, us here are relatively naive adopters of trying to use a decision support in a hospital electronic health record. You really need to be able to develop a road map that tells someone in a new hospital or a new medical center that's joining a consortium how really can you get this deployed in your system and then develop standard ways to extract data from EMR for research across sites again both within and outside of the merge. Again, this is an ongoing theme of trying to make the tools and the data extraction work both in funded sites and sites that may want to participate in a particular area even if they're not formally part of the consortium. So probably what the group that had the most fun and we had the most lively discussion was how to use new methods and tools that are considered disruptive in the positive sense to health care that may impact this line of research. This of course might include tools that allow for rapid real-time variant interpretation using publicly available data sets and expert curation, assess crowd sourcing for variant classification and new methods for efficient reinterpretation of genomic variants. Much of this could be done in consult with the Klingen resource that's also working in this area and then the idea of using deep learning techniques to characterize uncertain variants, drug targets, et cetera. Similarly, with regard to the disruptive technologies, there was an expectation that there really are going to be a variety of omic methods that will be combined with genomics such as, for example, the proteogenomics work that's now going on in cancer patients, breast cancer patients, for example. The attendees of the workshop anticipated that there's going to be a variety of methods where patients and physicians get data that can be websites or apps to better inform physicians and patients with regard to sequence results. Of course, there may be the use of wearables and other devices that may aid you in assessing the phenotype of patients with a putative disease variant. So you can imagine using a watch that detects blood pressure or heart rate in patients that have cardiovascular variants. And there was overall a strong effences on patient-centered data governance and recommendations for future ELSI research on issues related to emerge research that more and more patients will want to be interacting directly with this data or there was the assumption that that may be the case and that any future consortium should take that into account in their design. So to just summarize, the final slide just tries to summarize these recommendations from the workshop. It is important to decide on the appropriate balance between innovation and standardization across the sites to increase the power of data analysis. Clearly, there's a need to expedite variant classification by automation, machine learning, and crowdsourcing. Certainly, there's a need to test innovative ways to present genomics results to physicians, patients, and perform longitudinal follow-up of patients by shifting the sequencing and return of results early in the funding phase. There is a need to increase the efficiency of developing these EMR-derived phenotypes for the eventual re-phenotyping that the consortium will want to do. There was recommendation to increase engagement of diverse patient populations on ELSI issues related to genomic testing and use of electronic health records. And finally, to facilitate the usefulness of the emerge tools that develop out of any consortia across and the analyses across multiple research consortia and diverse health care systems. I just want to acknowledge the planning committee in particular, Rex Chisholm, Dan Mays, Howard McLeod, myself, and then we had a large workshop. Attendees, I'll show you on the next picture. And in particular, the staff that are shown here that were really helpful in getting the workshop put together, as well as creating the summary statement and the executive summary and helping me with this presentation. And this was the workshop. Thanks. Well, thank you, Sharon. Are there any questions for Sharon about the workshop report? Since I'm often the person who asks the first question, someone else has to do. Yes. Yeah, I'm not sure if this is something that you're prepared to speak about or not, but could you maybe say a few words about the status of the phenotypic data sharing? You know, clearly NHGRI and others have really pushed forward genomic data sharing to the point that we now all have clauses in our NIH grants. But it seems like the clinical phenotyping remains a hot button issue and your ability to share that. So maybe you could just say a few words. Well, I'll say very briefly, and then Teri can answer much better. It is important to realize that most of the phenotyping here was derived from the electronic medical records, not like some of the consortia who were thinking about literally re-phenotyping, like calling a patient back in and re-examining them. And I think some of these tools are specifically designed to then make those phenotypes available, but Teri could answer that better. No, that's exactly right. In fact, all of the phenotypes that emerge are derived from electronic medical record, and Dan, you may want to comment after I do. Dan's group has done a lot of work in privacy issues. There are obviously big concerns about giving access to somebody's entire medical record, and so we're trying to figure out the best way to go about this, and they've actually come up with algorithms to simplify given phenotypes. What is being shared currently are the phenotypes that are developed and agreed upon by the whole consortium, that are the e-phenotypes that take so long to develop. There are about 40 of those now, and those are going into this VKB, the phenotype knowledge base, which gives you the algorithm for defining them, so not only can you get the data out of DBGAP on those phenotypes, but you can also apply them in your own population so that you can compare your results to what's gotten out of eMERGE. We'd like to share much more than that. I think that's something that we have to continually push on, but right now, the way it's done is that each of these sites is sort of queried when there's a question about a phenotype that hasn't been developed. You have to go back and query them and work with the investigators. Dan, did you want to add anything? No, so this is a continuation of a lunchtime conversation I just had with Trey. We're not about to upload our entire EHR so that you can use it. It's just, that's the short version, and that's the conundrum that we have because as we talked about, the most interesting kind of work comes from people who are really looking at the whole data set, not just sort of a little bit. The compromise right now I think is a little bit more than Trey suggested because there is a set of 105,000 records that have GWAS-level genotyping and ICD-9 code diagnoses attached to them. So that's more than just 40 diagnoses, so it does give you an opportunity to look across about 1,800 diseases and that's been pretty useful for a lot of work so it's not the entire EHR and all the natural language processing you might want to do, but it's better than a poke in the eye with a sharp stick. So I think that, and I would love to figure out a way and Trey will know the answer to this. I know that we can access it. I assume that people outside of Merge can access that data set as well. The ICD-9 codes? I'm not sure that those are available and I wouldn't ask my colleagues to comment rumbling if maybe you could. I know that Brad, she's coming to the microphone, Brad Malin at your site has figured out ways to simplify the ICD-9 codes so it sort of fold them up into units that are less identifiable. So we use something called FIWAS codes which is something I don't, a term I don't like because it comes from this particular approach called FIWAS, so I prefer the term FIWAS. So it's very simple. There are, I don't know, 75 codes for schizophrenia and what we do in the FIWAS code world is we roll them all up into one code. So you lose a bit of granularity but it makes the sort of, it makes the data available more, I think. And it also makes the searching a little bit easier. But if there's a researcher who wants to know the particular genotype is associated with a subtype of schizophrenia, then that gets lost. Basically, they emerge to share their data in three ways. One is DbGaP. So DbGaP has phenotype data that's generated after long this algorithm among this network. So also have the genotype data in the DbGaP. And we now have the imputed genotype data for the, for about 46,000 individual that's imputed against the hypertype map use the Michigan imputation technology on the pipeline. And also that one is already in the DbGaP. So if anybody want to use that for their association study, they can use that. That's the first thing. And the second, we have the phoenix. That data also can share among the scientific community. They have the pharmacogenomics data there. They have all this, the actionable genotype variants there. And also have some aggregated data, not individual data there. And the third one, as Dan mentioned, that we have this phoenix code. The phoenix code is not a single ICD-9 code for one specific clinical phenotype. They might have one ICD-9 code or multiple ICD-9 code coded for the one disease. That one, the phoenix code, no NLP, no national language processing because that is so difficult. So that data also can be shared. They share data for the DbGaP data. You can submit the data access request through the HGRI to get the DbGaP data. If you want to use specifically emergent data, like I mentioned for the phoenix code, or other data you want to look at in the detail, you can apply for affiliate membership with the emergent network. So you collaborate with the emergent network investigators to get some maybe additional detailed data that we have not deposited to the DbGaP yet. But eventually, if we were deposited to the DbGaP. Yes, so thanks. Can I ask Carol first? Go ahead. So this might be related to Trey's question, but one of your recommendations is to expedite variant classification by machine learning and crowdsourcing. But was there discussion about specific data sources to do that? I mean, to do machine learning, you need positive and negative cases. Right, and this is where I just tried to comment. This is a major focus of, for example, our clinging grant. And so there, they are pulling large sets of, for example, familial hyperclasalemia, patients and cases. I believe in this sense, the idea was, for the patients where you do have this phenotype data from a merge, the patients who have these phenotypes that have been carefully electronically phenotyped, they could serve as ways to facilitate machine learning. So that was part of the thought. Yeah, Eric? Sharon, your presentation highlighted a tension that I felt at the meeting indeed that between breadth and depth, the breadth is, there's a lot of problems in this area. You just fund the very best people. They come with their own disease, their own clinical setting and their own biases. And as much as possible, you try to harmonize and make the best of it versus a focus on a few high priority areas and everybody does the same thing and increase sample size across the diverse clinical settings. Do you have a feeling of how that tension can be managed in the, because the title is the future of a merge. Yeah, I mean, my sense, and there are others that were at the workshop can comment, my sense was it felt for the future to focus on the latter, that it was important given the size of these studies. These are not small like UDN where you're really dealing with individually unique patients, but that the preference would be to focus a bit more on a consistent set of important medical problems that are handled similarly and that there still might be individual pilots or other things that sites would do, but more consistency. I know with regard to the discussion of return of results, the differences in how the results were being returned did limit somewhat what you learn from each site, but it limited having large numbers, particularly because most patients don't have a positive result, but it was clearly a tension throughout the entire workshop of these two issues, but that was my sense on how the group in the end came down, pick a few important problems, try to make it somewhat more consistent across the sites to get power, but happy to have others who were there. Any other questions? Okay, thanks Sharon. So our next presentation will be done by Jim Ostell and Eric will make the presentation.