 So we're going to try to do this a little bit more as a TAG team. We were challenged to talk about enhancing functionality of each EHRs for genomic research, including e-phenotyping, integrating genomic data, transportable CDS, and privacy. It's not quite as long as Mark's, but almost, I think. So the first thing we wanted to talk about was the importance of informatics for an EHR for genomic medicine. And we thought one way to do that was to give examples of a couple of use cases. And the use case that I'll talk about is Emerge. And as we've talked about quite a bit already today, we think there's two really important elements to this. One is to show the value for discovery and actually to develop new evidence that one might use going forward. In the case of Emerge, we were trying to accelerate gene phenotype associations by actually using electronic health records to do data mining, to develop electronic phenotyping algorithms, and then use those phenotyping algorithms to do genomic studies that would make new associations. And so I'll show you in a minute how we've gone about doing the e-phenotyping piece of it just one way. But next piece of that was to really demonstrate the portability of the phenotyping approach. And one of the things that goes without saying is that across Emerge Network, not everybody has the same electronic health record system. And not only do we not all have the same electronic health record system, in those instances where we do have the same electronic health record system, they're really not carbon copies of each other, even each epic installation. We have quite a few epic installations are quite different from each other. So we had to develop this process of portability. And then the second broad piece of this, which is another role for electronic health records in genomic medicine, is this whole question of, how do you integrate actionable genomic data into the EHR? How do you deal with the volume of genomic information that's potentially available? How do you use that to fire specific clinical decision support at the right time, and yet not put the provider at risk of alert fatigue? Because they're getting the alerts every time something happens, even if it's not relevant in that particular case. And then how do we demonstrate that this kind of integration is scalable? So our approach to phenotyping, just to talk about it for a couple of minutes, is to focus first of on the phenotype of interest. And then in parallel develop a case and a control algorithm. And that case and control algorithm in the case of Emerge, one of the first things that we learned was it's insufficient to just use a single data type. You can't just use diagnosis code, for example. The only way the algorithms actually achieved any kind of robust ability to identify cases and controls with comparison to gold standard cases was through a combination of data types, including not only lab values and diagnosis codes and procedures, but actually quite often detailed processing of actual language that's there in text notes. And that turns out to be a fairly important area for us. Once you've got those algorithms which were typically developed at a particular site using what was known at that site, there was this manual review process to assess precision. And what we found frequently was that there was always a trade off between the elements of that algorithm and how many cases or how many controls you were able to identify. And so there was a process to both increase the precision of it, but to do so in such a way that you didn't throw out all the cases that you might potentially have and that became a real challenge. So after that internal process happened at a particular site, then what would happen is we would iterate through that process until we hit a 95% positive predictive value at that first site. Then the second process was to deploy it at a secondary site, where the secondary site would add in what happened when you used those elements at their site. And inevitably one of the things that we learned was that that site there was, there were things that were in the algorithm that were either unique to the first site which needed to be addressed or other elements which could be improved. And then we went through the process of validating it at multiple sites and then ultimately did the genetic association testing. So one of the things that was important in that process was to think about how could we actually share the phenotypes in a more computable way. One of the pieces of work that was done in a merge and I don't have the time to go into details was actually to develop a package that could be transmitted from one site to the next site and basically almost install that would provide at least the logic for the decision process using a nine workflow approach. The second big piece that we wanted to talk about was the issue of data integration back into the electronic health record. And this shows just one way of doing it. This is how we're doing it at Northwestern. It's some similar version of this as being done at most cases in most places. But the elements begins with a clear laboratory returning results through some kind of secure data receiver. We've spent a lot of time focusing on how to get those lab values back, not as a text PDF, but actually as a computable triple store that could be put into the system. That goes into something that we've been calling an ancillary genomic system. And the idea there is that if you had whole genome sequence, you're not gonna wanna dump the whole genome sequence into the electronic health record, but rather only the things that are quote unquote actionable. And that brings into play the secondary component which is this actionable variant knowledge base. And many of us are excited about the prospects of ClinVar or ClinGen actually ultimately being that actionable knowledge base. Where you could then apply the knowledge base against your ancillary genomic system through some kind of a knowledge engine and then put those variants and only those variants into the epic interface. Which then can be delivered in a variety of ways to physicians through inbox best practices alerts or lab results. But also equally importantly to patients through their MyChart portal. One of the key elements of this kind of a system architecture is it addresses this problem that we've talked about several times today. Which is the need to update as knowledge becomes improved in the future. So you can imagine this ancillary genomic system and an actionable variant database constantly being updated and rerun through this knowledge engine to update the information based on current best practices and current evidence. Whether that upgrades or downgrades recommendation for a specific variant. So with that I will turn over to Alexa who will talk about the next use case. So I'll be talking about the undiagnosed diseases network. Which is actually a different kind of network from what I've been hearing today of the existing networks. That is to say what we have a group of seven clinical sites and two sequencing centers who've come together under one protocol that's run out of NIH. So this is a program that actually grew out of an NIH intramural program headed up by Bill Gaul and others in the clinical center and beyond. And so we have one protocol whereby people will be, will apply to the undiagnosed diseases network. If they are found to be scientifically interesting on a variety of different parameters they would be accepted into the network. And then they would be assigned to one of the seven clinical sites to get a full one week workup essentially. What we've done, oh did we just lose a slide. What we've done is that we have reached agreement with all of the collaborators on the kind of phenotyping that we're going to be doing. The deep phenotyping that will then of course be correlated with the genotype. Every individual who comes into the system will have their data either, will either be a whole exome or whole genome. And Howard talked about that a little bit before. We haven't fully decided how that will work. But and to the extent that there are family members, then family members will also have their sequences, will be sequenced. So we have then agreed on using the human phenotype ontology to actually do rather deep phenotyping and actually there's a star system involved there as well. As to how deep you go into characterizing the individual. There will be more work done and this relates to the EHR. There will be more work done that is captured within the electronic health record system and we're having some discussions about how much of that, that of those data will actually be part of the overall network. We've made a sort of distinction between above the line data and below the line data. So what folks do at their clinical sites versus, which is below the line, versus what data do they send above the line for use across and within the network, but also for later sharing with broader systems in a de-identified form and so there's a new version of Java that's available, which is outstanding, which is on my, I see, it's on my screen, sorry. Okay, I'll kill it. So we're going to be talking about patients. So we do have a patient voice here and actually recs the story that you were telling before. This is Matt Might who is an individual who I consider to be a citizen scientist. He's actually a computer scientist who is, has pushed very hard to understand what the problem is for his son and then has used social media to find additional cases and has been very successful is very much of the social media generation. In addition, we're going to be asking individuals as they apply to the gateway to actually phenotype themselves. So the patients are going to be phenotyping themselves. They're going to be asked a series of questions and we're collaborating with Genome Connect on that as to what that would look like and what that language would look like and then there would be mapping from the, in fact, I think that's already happened. In fact, it's mapping from the Genome Connect terms to the human phenotype ontology, which is also an international effort. So as far as implementation, we are expecting to have very broad sharing of the de-identified data. So written into the RFA was that any data that are generated as part of this project will in de-identified form be deposited in DBGAP. So that's a definite, there's no argument about that. We'd like to go more broadly and I think we have agreement with all of our collaborators that if we can share our data with other public databases, Phenome Central, the Canadian group or other groups, in fact, some of us will be in Budapest at the end of this month seeing whether we can share data more broadly with our European colleagues and beyond. We have created some research tools. We've been collaborating with Mike Brodno in Canada on the research tools around phenotyping and so forth. Probably our biggest barriers have been, barriers in the sense that it's taken us time to get going have been around the fact that we need a central IRB. So we, in fact, all of these seven clinical sites and some of those clinical sites actually comprise several of their own clinical sites, Harvard being a case in point, that we had to get everybody to agree to follow the same rules there. And we have data sharing and use agreement that's taken a little bit of time with a lot of back and forth with the various groups. And importantly, we are subject to FISMA, the Federal Information Security Management Act. And this has had some impact too on how we're doing things. But in the end, I think it's going to put us in a much better position with regards to those privacy issues, while at the same time, we want to share as broadly as we can. So we're just getting started. We have not yet seen our first patient, although we're having a soft launch, if you like, of our system any day now. So we'll be having five patients from each of the seven clinical sites. So we'll have data from 35 patients that we can then analyze, look at, and see what we need to do to improve as we open it up more broadly. So that's, I think now I turn it over to Chris. I think we've been through a lot of these organizations today and groups. So I'll do this relatively quickly. But the whole notion of introducing sequence data into patient care is more than just the actionable variant. As we all know, the volume of data potentially is overwhelming for a lot of electronic health records, and taking raw sequence data, raw genomic data into the HRs, probably not a clever idea. I think what most of these groups have concluded and what Rex illustrated at Northwestern is the creation of what the moral equivalent of a PACS system, a picture archiving system, as is used for imaging. Imaging data is typically kept outside of the electronic health record, and only the good parts, if you will, the reports or the snippets or whatever, are actually imported into the transactional system of the EHR. It's clear that an analog of that is evolving in the genomic space, where the raw sequence data is being placed in an offline repository or a near-state repository. And that brings us to the second point of incorporating the actual variance. As we all know, the debates about what constitutes an actionable variant are rampant, and we're grateful for organizations like CPIC to help us sort that out. And we're also grateful for organizations within CPIC, like Bob Frymuth's work, trying to identify how do we actually render these actionable variants so that they can be recognized from a decision support point of view. It's not so difficult, from a research perspective, to characterize what an actionable variant looks like. But it is a bear to turn that into something that the average clinical decision support system can treat as a single entity or a single element and be able to act upon. And the reason for that distinction is we know that a lot of novel variants are usually represented as a grammar of human genome variant system characterizations or the like. And clinical decision support systems cannot deal with a grammatical expression. They can only deal with named entities and discreet entities. And in that context, okay. What do you mean by grammar? You lost me. Some kind of, it's at this location on the gene and this is deleted and that's inserted and this is replaced with that. You can express that kind of statement in human genome variant society's grammar for characterizing genomic variation. So you can unambiguously represent that data and you can change the reference gene and these things are interoperable. But they're not from the perspective of a clinical decision support system. They're not tags that a rule can readily fire upon. It's very hard to make a rule fire on some syntactic expression of a genomic reality. And that's what we're dealing with as we're actually, as the rubber is hitting the road in terms of clinical integration of this information into the systems. The whole notion of electronic phenotypes as we've heard how Emerge has been doing that and indeed, many of us have been part of that. But it's important to distinguish what has been characterized as deep phenotyping. And particularly when we're in genomic variance and the whole question of phenotype is a very slippery slope. We all know that phenotypes can exist from a molecular level, a cellular level. I mean any arbitrary level of physiologic organization up to society, I suppose, and to say that we have a phenotype at a patient level in the context of Emerge, it has been a very surface phenotype. At the level of those variables that are readily available from an electronic medical record manifest as by things like drug use, a disease assertion, laboratory variation, and the like. That's just the veneer of what phenotype actually is. And while it's convenient and scalable and we've made a lot of progress with electronic phenotypes, we have to be more thoughtful as we move forward. But I'm going to skip ahead to the standards thing for just a second. Because it's important that we not get ourselves into the usual domain silos of how we characterize phenotype. We don't want the research community to do it one way, and the clinical community to do the same darn thing a completely different way. These things ultimately have to be recognizable, scalable, determinable from equivalent sources. And then the whole process of implementation and sustainability ignite, I think, has been really at the forefront of that in terms of process engineering and understanding common pathway and common activity. Next slide. Over there. So the whole notion of gaps and opportunities, go to Midish, you get a muffler, come to me, you get to talk about standards. And I'm happy to provide one. But the whole notion is that there are standards and there are standards. The old joke is that there are so many to choose from. And again, by analogy, we have actually NIH creating clinical data elements or data elements about clinical observations. That how do I say this politely? Are completely disjoint with what is going on in the clinical information space. What is going on in the HL7 standards community? What is going on in another center? By the way, dealing with the same problems, HL7 is not totally unaware of genomic characterization and the integration of genomic facets into medicine. And yet to my knowledge, we have the traditional baronial divides of academic research communities going about generating these kinds of things with, in many cases, a blank sheet of paper and not effectively leveraging what is actually going on in the clinical space. So the whole question of interoperability. I think the advent of APIs or application programmer interfaces into electronic records will be fundamentally transformative. Historically, the way data was stored within an electronic health record was the way data was stored in that particular health record. What's changed is now a customer expectation that they can write this API call and they'll get back what they expect. And it shouldn't matter whether it's CERN or under the cover or Epic under the cover or Northwestern Epic or Mayo Clinic Epic or Johns Hopkins Epic. You should be able to execute the API and get an expected thing back. This is new and different and this has really only been implemented in the past year or so on a prototype basis and has not really achieved a large scale implementation. But the point is, as we establish standard interface methods into the electronic health records that will return predictable information reliably, that will transform the way we can actually query the record, interact with the record, and have decision support environments that actually manage it. The whole sustainability problem, long-term access, this starts to get at the fragmenting of patient data across different providers, the fragmenting of information about them, not just with healthcare providers, but frankly with ancillary sources, with laboratory environments, with genomic testing environments, with other kinds of environments. So data integration, I think, is really the task that is really before us. And whether it's the patient that integrates that data, the research community that integrates that data, the provider community, accountable care organizations, these are political issues. But the task clearly remains that we need a way to sustainably integrate and defragment information about patients if we truly want to make sense of it in a scalable way. And then finally, the whole notion of consent and metadata, I think as a community, and I have to count myself among the guilty, we have not historically given enough thought to carrying sufficient metadata. And by that, I mean where did the data come from, the typical provenance data, the who, what, when, where, and why, and the consent information, the permission information that goes along with that as part of the payload with the data that we are actually managing and manipulated. We should never separate that metadata, especially the consenting information, from the payload data. And we should learn to think about those as self-describing, self-contained data objects that we can manage and treat as objects so that we don't have to go searching for the consent information. It's part of the data. We don't have to go searching for the provenance information. It's part of the data. There are circumstances where provenance should be protected, privacy, confidentiality, do you need to know that? No problem. It can be encrypted, but it should not be paired away. And so the principle of how much information do we actually carry along with the payload data? Storage is no longer an issue. I mean, to complain that we have to store an extra 50 bytes of data is not going to cut it in the 21st century, particularly when we're looking at, you know, gigabytes of information that it's actually characterizing and describing, just a tiny, tiny fraction. So the importance of metadata and carrying that information along, I don't, especially consenting, cannot be underestimated. So with that, I return it to you, Rex. No. Oh, that's, that's Alex. Okay. We skipped over the synergies, but that's fine. So what we're talking about is what training opportunities, what is already out there with regards to some of the issues that we've been talking about, the standards, and so forth. So there are some training programs. The National Library of Medicine Informatics Fellowship and Training Program certainly exists as a longstanding program. 14 sites, academic centers across the country have these grants. There is now additional training for clinical, the clinical informatics subspecialty, which I think many of you are aware of. That's focused pretty heavily on HIT issues and electronic health record systems and so on. And then currently the BD2K training efforts. I guess what I would say about this is that these are not coordinated efforts. And so the question is, what's, whether the appropriate kind of training for the informatics kinds of issues that we were discussing here is there and whether there's a way to bring some of these programs together and to make sure that everybody is getting these sort of the same foundational concepts as they go forward and become independent investigators. We absolutely have to improve the pipeline of math and computer science skills. And I think we were saying that the earlier we start on that, the better off we're going to be because we have, for example, in my program, I train post-doctoral fellows. It's way too late to teach them math and to teach them quantitative skills. And yet we have to do it. We have to do it. And so I think it's only Harvard. All right, point well taken. OK, point taken. How about this? OK. So then I thought we would move into the discussion questions and we put our heads together. And we came up with a number of questions. And I don't know if you can read them, but I'll read the first one. And that is, while individual projects might agree on data standards specific to their needs, how do we plan for and promote large scale data sharing across projects and beyond? And we have been talking about that here. But just to address Terry's concerns, what could the Genome Institute do here so that we are actually sharing across projects? There's one thing for us to share within our networks, and perhaps maybe to share with a couple of public databases, but how can we do better at sharing across projects? So I guess I would ask our collaborators first on the panel whether they have something to say about that, and then maybe open it up to the group. Rex? Let's hear what the group has to say. OK. I'm actually not in a good position. I think just for the sake of making the discussion move, if you can read the bullets, especially if those of you have in front. Should I read them all the way through? Let's just open it up to anything. OK. I don't think we need to read them all. OK, fine. If people can read them, then that's fine. OK, great.