 This session will go from now until approximately 12.30. And this session is going to be focused on data issues that can impact genomic CDS. At the outset I want to just, you know, let everybody know that, you know, we do understand the fact that these are not necessarily separable items, that data does relate to knowledge representation, which does relate to implementation. And so we expect that in the course of the discussion that we will begin to identify some of those state transitions. But we do want to try and maintain focus as much as we can, and then it'll be up to Blackford and me to, at the end of the day, begin to synthesize the discussion from these three different areas into what's going to TSF for tomorrow. Oh, and I also want to introduce a new, the late Dr. Eric Green. Is that my arrival time or my demise? Yes, it is your arrival time. I'm sure it has something to do with the cardinals, but please, Eric, I'll give you a couple of minutes here if you want to just introduce yourself and say anything, any pithy remarks? Oh, wow, okay. Well, I am late, not because of my baseball cardinals. I am late because of an institute director's meeting that got scheduled at the last minute. Sorry that I am here late, but I'm delighted to be here thanks to the folks that have put a lot of work into getting this meeting up and going. I appreciate it, and I'm really looking forward to the discussions we're going to have of the next day or a little over a day now. Right, thank you, Eric, and thank you for being here. So I'm going to turn this over to our session moderators, Bob Freimuth and Jim Ostell. They're going to set up the discussion. I will be in charge of keeping track of people that want to contribute to the discussion, so continue to try and flag me, particularly for those of you behind, throw things at me or, you know, do whatever needs to be done to get my attention, and we'll make sure we get you, and I am keeping a list of order, so if I see you, I'll try and acknowledge that and add you to the queue. All right, thank you, Mark. So as Mark mentioned, Jim and I are going to be moderating this session. The summary of the session is shown up on the slide. We were talking specifically about data issues that impact genomic CDS. For those that don't know me, welcome. I look forward to a really invigorating discussion here. My name is Bob Freimuth. I'm from Mayo Clinic. My background is in pharmacogenomics on the wet lab side, discovery and characterization. I spent some time doing functional prediction and bioinformatics, and have since moved into the clinical side, looking at application, data standards, knowledge management, and systems to support genomic CDS. So I'm coming at this from a variety of perspectives. My co-moderator is Jim Ostell. I'm the branch chief for the part of NCBI that builds all the production resources, which is as diverse as PubMed and PubMed Central, but also to the short read archive and GenBank and ClinBar and the genetic testing registry that are relevant to this. My background is actually as a molecular biologist who accidentally discovered I needed computers, and I've been really at NCBI for the last 30 years. And my interest in this is, or my expertise really is more on sort of the data central resource genetics side. So the way we're going to set this session up, and you'll notice it's called a panel, but the table up at the front that has four chairs for panelists is currently empty. It's because all of us are the panelists, so congratulations, in addition to being labeled as a thought leader, which you can put on your CV, you can also say that you are a panelist for this session. The way this is going to work is that we have a few questions which are outlined in the agenda and shown up on the slide here, and we're going to use these as a framework to guide the discussion for the next 90 minutes. We have a few outcomes, a few goals for this session that we'd like to try to hit on for each of these three questions. Jim and I'll do our best to try to make sure that we cover all the bases as we go here, but largely it's going to be an open discussion by the group. So just to remind you, the stated objectives for the workshop in general were three-fold. The first, dealing with defining gaps and strategies to close those gaps. The second, to identify health IT initiatives that support recommended strategies. And the third, to help define a prioritized research agenda for genomic CDS. So with those three things in mind, we're going to go through the three questions that are shown on the screen here. The first deals with data types that are essential for genomic CDS. The second has to do with the nature of genetic data. And the third has to do with challenges that might be unique to genomics data and its application through genomic CDS. So with that said, Jim and my job is going to be to throw things out as topics for discussion and let that discussion ensue. We'll try to stay out of the way as much as possible and just make sure that we stay on track and keep hitting the points that we need to hit. So the first question is related to data types being essential for genomic CDS. And I would add to this, do standard representations for these data exist? Where are the gaps in these standardized representations? And I will hearken back to Dan's talk this morning related to what information is needed to help make a clinician make a decision. Beginning first with patient level and clinical data. What sort of things might we need to hit on? And what standardized representations would be needed to facilitate genomic CDS for these? Please. I would first like to point out regarding standards that there are multiple layers of standards. So we usually talk about genomic standards which usually emerge from large-scale project where standardization is an outcome of an attempt to profile, to compile a large number of profiles, genomic profiles, and interpret them. The second layer is actually a wider standard which is dictated by bodies such as the W3C consortium, World Wide Web Consortium, which sets the standards for knowledge representation, such as semantic web technologies, which are used for ontologies, and other knowledge representation systems. It also sets standards for data linking, such as link data platform 1.0, which describes how interoperability can be achieved in terms of for sharing knowledge. And so there are at least these two layers of standards, one more narrow, genomic specific, another evolving in the wider community. The third may be specific for electronic health records, and that's the one that they have least knowledge about. But others may comment about it. Sandy. I'd argue that the knowledge that you, that the data that you need really varies by the clinical, the specific clinical decision support rule that you are seeking to implement, but the types of things that you could wind up needing are knowledge on exactly what variants were found in this patient represented in an unambiguous way, data on what regions of interest were assessed by a test, again represented in an unambiguous way, and then the interpretation of variants, how variants were classified, the overall interpretation of the test, and what, how that test has been assessed relative to diseases, again needing to be represented in a way that you can rely on. And I do think, I strongly believe in everything that was said in the previous session related to the need for shareable representations of knowledge and shareable representations of clinical decision support, but in terms of actually beginning the process of implementing a clinical decision support rule, I think that this patient's state and obtaining it reliably is often sort of the first step and very difficult to do right now. So Sandy, if I can extend that a bit, in the context of this discussion about data, because I agree with you, I think that, you know, that we've talked about that from a patient context, patient state, you know, this ability of the environment to recognize when a certain rule needs to happen, and we've implemented this in a relatively prosaic way, which is if I order a medication, then I can look for specific genomic variants that might impact that medication. I mean, that's about as simple as it gets, but I think we're talking about things more completely. Can you give your thoughts about data that would be relevant to the patient state, or is it really encompass the entire universe of data that is captured in the course of clinical encounters? Yeah. So my thought is it really depends on the clinical decision support rule itself. So for example, if the clinical decision support rule is I've learned something new about a variant, and I want to make sure that I get that knowledge to all clinicians where that variant was identified, then I need to make sure that I can identify that I have the data on where that variant was identified, and that I can act on that data. If there's a test that has, if I want to make sure that a specific region of the genome was assessed before a drug is ordered, because there's a pharmacogenomic effect that needs to be looked at, then we need to know what regions of interest were represented in that test. If I want to know whether a patient has had a genetic test that assessed a specific disease, then I need to know that. Am I answering your question, Mark? Yeah, I think so. And if I can, I know Jim is going to get in here as well, but this gets to, I think, one of the issues that we were talking about right before we went to break, which is the idea of, in some ways, I mean, what you're describing is a necessary element for all CDS. And I think we can probably all agree to that, that each individual CDS rule needs to understand all of the specific elements, including those that are present and those that are missing. And so, in some ways, one of the things that we're trying to think about is, since genome can't take on all of CDS, what are the things within the realm of genomics that perhaps are not currently represented in data, either because of standardization issues, as was pointed out, or because of technical issues, and how does that relate to the point that you're making? Yeah, so I think that all of these different elements of information about what has been done from a genetic testing perspective on this patient needs to be, we need to get that represented in structured form. And I think that that is, I mean, in some ways, it's general to, it's a general CDS problem because CDS always needs the base data that it's going to act on. But it's also a genetic specific problem because these are genetic specific data elements. And I think that part of this is a standards problem. I mean, I think that within each of our institutions, when our institution generated the data that we're going to act on from a CDS perspective, I think it's much easier for us to stand up these CDS rules. When the data was generated by some outside laboratory, I think that we've got many more challenges associated with doing this well. And to some extent that's standards, but to some extent that's also creating, as Adam was talking about, the ecosystem, the connections and transport mechanisms for getting this data to move, which involves setting up processes and infrastructure of various types. Yeah, I struggled with the same question about what are we going to talk about here. And so to help me think about it, I tried to follow the suggestion here and imagine the ideal case and then try to sort of backtrack from that. So my ideal here, if we say take the simplifying assumption that we're just talking about a germline genome, is that when you're born, we sequence your genome. Somebody does. And I would say it's not the perfect world because it's not a perfect genome, let's say, where the technology is not perfect yet, but it's comprehensive as opposed to region-specific testing. The reads are put somewhere and there is a group of in the research community, coupled with the clinical community, who actually evaluates sort of best practices once every two years or three years or something. And every set of reads in this central place are then reassembled, realigned to the new set of references using whatever the current best practices, the variants are recalled and those are made available. And then the clinical decision support step begins with we took best practice reads at this point in time and now we're applying sort of the knowledge of the connection to phenotype and genotype and this thing continues to cycle in the background. And to me, that's sort of what's unique, that's trying to separate what's unique about a genomic CDS from any other CDS where you're going to also need to know demographics and family history and all the other things. And so my question is, how unrealistic is that? How far away is it? Is every hospital going to store the reads and assemble them and change with each new technology? Does this, because the genome persists over time, at least the germline genome, is it going to be reanalyzed automatically as technologies change, will it be notified? So sort of in that world, because I'm thinking we're not going to resolve this particular issue in the next year. So maybe in five years we're actually in that domain. So James, I think that what you're describing in the current state is completely unrealistic. And I think that's one of the issues that Sandy is trying to get at. And the questions here are kind of what is the clinical decision support engine going to fire off of? And you're proposing that it fires off of the raw reads. There are right now, there are many, many intervening steps in between them. And I think that that's really the challenge is defining what those intervening steps are and what level standards needs to be, what standardization needs to happen to have those intervening steps. So just to correct, I'm saying it would fire off of the called variants. So there's always, so it's fired off of the called variants, and even that may be somewhat unrealistic because there are so many different varieties of called variants in an individual gene. And in order to be able to support that number of rules, even with a large nationwide group of people working on it may be untenable. And so there could be standards for saying what, how are the individual calls reported and then other standards for saying, how are these individual calls classified? And if you go, that was my initial question about desideradas, is there a separation between the calls and the classification and the clinical decision support? And what's lacking right now, and what actually, I love the concept of ClinVar and I think that it's moving towards where it can be actionable. One of the steps that's missing there is that piece of standardized, standardized clinical interpretations that can be fed into clinical decision support. So someone can say, I have this SIP 2D6 variant and or this, any number of SIP 2D6 variants and all of these variants fit into this clinical classification that then a clinical decision support engine can say all I need to fire off of is the, I can be aware of what the variant is, but the rule fires off of the classification and any number of variants that have that classification. So that, at least from what I see right now is completely missing and I see components of ClinVar that may lead into that. But I don't see them structured and robust enough in a way, kind of thinking, could these potentially be used as flags for clinical decision support? I would encourage you to think about ClinVar as moving in that direction to say between gene and condition and clinical, I'm looking at ClinVar now, clinical significance or between those three flags, someone can fire a clinical decision support rule that would be informative. Yeah, and I think to some degree that this is one of the key things that we're focusing on in the ClinGen project, which is to try, because we've all recognized that as useful as ClinVar is, it does not have that. And so I think, it's not to say that that's not within the purview of discussion and there's clearly a lot more to do, but there's at least some intentionality in terms of trying to create that type of knowledge repository. Now, again, it's another gap to say, well, now how do we get CDS to fire off of whatever that is going to end up looking like? Absolutely, it's great that you're thinking about that. Thank you. Yeah, Jim, go ahead. Yeah, so ClinVar by design is not expected to get all the way up to clinical support. It's really intended to be a layer on which you build that classification. So it's meant to be sort of the broad collection and there's other people here involved in ClinGen, which is sort of that tier, next tier that you're talking about and we should let them talk. But the other thing I want to be clear about my proposal sort of thing was sort of, I'm assuming that we do, we collect the genome, we call the variants. There is a resource like ClinVar. There is a resource which has the classifications. Those also need to be reviewed. I wasn't thinking you start as low as the call read. What I was proposing is rather sort of what are the central resources? Where does the data reside? Is it within every hospital? And at what point do you branch from sort of a common pooled set of operations and data storage to what happens within each care facility? Where does that happen? Just to comment in our clinical setting that I'm part of the MCW Children's Hospital of Wisconsin. If there was no diagnosis, we actually will do a re-analysis. Normally a six month or a yearly intervals. Actually in our case triggered by the request of the patient to come back in if no diagnosis was made. And at that time we will go all the way back to the read data if our pipelines have changed significantly. So everything from the read through the variant calling through the bringing in of the reference data gets updated at that point in time. So there is a huge flux of data that I don't think perhaps is seen in some of the other clinical decision support settings. And that can be every six months for each patient. Now we don't trigger. The informatics doesn't trigger us to reanalyze and therefore potentially come up with some more actionable data for that patient. It's triggered by the patient coming back in because we don't think it's appropriate to be emailing every doctor every time one of the pieces of data changes because they would get hundreds of emails a week saying you need to go and find this patient to tell them the data has changed. And it may have an impact on their clinical care. We think that's too onerous on the healthcare system currently. Although I'm and I have everybody in the queue here but I want to get to Sandy and Heidi because this is something that they've been looking at again more from a traditional single genetic test perspective which is as we call something as a variant of uncertain significance and that variant classification changes. There have been some approaches of saying well when we know it's really important enough to flag to a clinician how might that be done. So I'm assuming that you're raising your hand to talk to that specific issue. Yeah, I think there's a couple of things. One is we think about genomic data. It exists in various levels of quality. And so the clinical decision support environment that we've supported that has automatic firing when knowledge changes is only operating off of validated confirmed variants so that we know that patient has that variant and then we can focus on knowledge changes in an easy way. I think the challenge we all deal with is when we're dealing with the whole genome or exome or large regions that haven't been specifically interpreted and reported with validated variants that we, there's a quality issue. And right now the standard is you don't put the whole genome in the medical record because so much of that data is not correct, right? And I think this issue you brought up, Jim, with the reads, my feeling is that in order to build an environment that really would enable you to sort of in real time almost query the reads and have them reanalyzed at a later point to then see what's accurately there. I think our technology is changing fast enough that the effort to build such an interactive system would not be worth it because I think we're going to eventually hopefully get to high quality variant calls sooner than the need to have a system supporting reanalysis of read data. But getting back to some extent to Sandy's point, context of accessing and what information you access from this data is really critical. So if context is firing a direct knowledge support rule, you have to have validated variants. If context is did the patient have their BRCA1 gene analyzed at all because they just conveyed to me a family history of breast cancer, you may be able to tolerate a different level of quality of understanding of the BRCA1 and 2 genes to just simply say were they sequenced? Is there data somewhere? And then this question of were any variants possibly found? And does that change the a priori likelihood that my patient sitting in front of me actually has a real and high risk for breast cancer? And maybe you even want to be able to say it was there a variant called in there that wasn't automatically ruled out as benign with high frequency data, even though I can't trust it because nobody confirmed it and it might be a false positive. Do you allow a world to actually interact and say was the area sequenced and was there any variants called that might potentially be pathogenic but not allowing that physician to act on what was not a technically confirmed result? And that's where I see an incredible challenge of allowing any access to unconfirmed data yet trying to support these questions that will come into the clinicians on a routine basis. And I don't know what the perfect answer is. Back to the comment talking over here and, you know, proposing the system. So we've already proposed the system and built a prototype of it where you've separated out the decision support knowledge from the genome data and even have a structure that separates out the interpretations of variants. And the decision support looks at the gene and interpretation as part of the decision support. So simplifying the logic as much as possible because you can't write decision support knowledge that looks at the variants. Practically you have to just say let's group these all as pathogenic mutations and run the decision support off of that. So we've proposed that, we've built a prototype of it, we've noted works. We've even used ClinVar as that knowledge base to inform the interpretations of the variants. And so on Twitter for all those Twitterites I posted that manuscript of that proposed system so you guys have that link to it. And the paper where we actually built the prototype and evaluated the prototype is coming out in EMEA, this next annual symposium. Yeah, so I think that that's, you know, something that there's been a lot of discussion in ClinGen related to this. And that, you know, in some ways we do have a gene focus. But the issue with the variant classification of course is that it's not a binary decision that a variant is, you know, is disease causing or not disease causing. But to some degree there's, you know, even using something like BRCA the, you know, the impact of a specific variant may be more or less risky as, and we may develop knowledge about that. So I think that we may need to, you know, begin to think at the data level about how that type of nuance, you know, could potentially be constructed even though it may not be necessary or even possible today because our knowledge is not sufficient. Yeah, and I think if we have something that we start off with, it might not be perfect, but we build on it line upon line and say, look, this is what we have now. We've got the gene. We've got the interpretation. And then as we use it and get experience with it and find, hey, we have these other nuances, so let's add this. But we can't wait to know everything possible about genomic interpretation before we do build something. We have to start with what we have now. If it solves 80% of what it does, great. We're 80% closer to our solution. And then iteratively refine and build upon that. And so we have a full, complete solution. Yeah, I just wanted to build upon actually the last two comments here. And that is to say that although we are missing in a very painful way standards that will represent molecular phenotypes in a robust fashion, we need to be careful about how those are constructed. I think there's a great amount of consensus towards getting something into place so that we can start standardizing how these things are reported on reports from clinical labs. But at the same time, as our knowledge continues to evolve, the standards that we use, those phenotypic terms and classifications that we use, need to evolve similarly in a graceful way so that we can add more of the nuance based on our understanding and on the clinical context. Thank you. Well, the comment I wanted to offer was not being an expert in the genomic data representation space, but to make sure as we consider those representational issues for the data and for the gene sequence and variants themselves, is to think about how we might encapsulate that representation in a way that protects it from the, or separates it from the other layers of representation. And secondly, it will be useful to have a notion of certainty attached to these representations. Because the inferential problem might then proceed sequentially, if you will, from low levels to intermediate levels of representation to high levels. And this is common what we do, commonly what we do, of course, in clinical, regular CDS. You know, the abstraction of the diabetes concept is something the clinician is comfortable with, but underlying that is all, you know, a variety of different levels of understanding. And the clinical decision support rule, you know, check the hemoglobin A1C every six months is interpretable. So this idea of abstraction and layers and certainty attached to representation would be very helpful when we get to the inference side. I think that's right on point. And I think just to take that one step further, we're in a space now where a lot of the interpretation that layers over genomics is probabilistic. And it's probabilistic because it's based on population measures. But to an individual patient that's sitting in an exam room, they're an N of one, the probabilities don't matter. I don't care what happens in the general population. Tell me what's going to happen to me. So I think you're exactly right. We need to have ways of expressing not only the precision with which those interpretations are made, but also give the clinician the tools that they can use to translate those population metrics down into a patient level. Yeah, so I wanted to comment on your, will the hospital keep the data? And I think, you know, we have to think about the genome being much more portable. It's about the individual, not about the health care provider. And, you know, patients go to multiple health care providers over their lifetime. So we have to make it easily available to those health care providers, not necessarily always in a situation where we're just basically spitting out the knowledge of a CDS and saying, this is what you should do. But having the data maintained using good data management principles in such a way that if we find a new genetic variant that is important in a particular disease, that that particular piece of data can be reanalyzed by whatever health care provider is trying to look at that. The other thing is, you know, whether it's cloud solutions or distributing computing environments, it really comes down to when you look at a lot of these variants. I mean, genome-wide association studies have demonstrated, I think, fairly clearly that the common variants that we all tried to look at over the years have not necessarily proven very fruitful. And what we are now kind of looking at is saying, well, should we concentrate more on the rare genetic variants? But when we concentrate on rare genetic variants, we're talking about very limited populations where those genetic variants exist, and therefore understanding phenotype and genotype relationships can be very difficult. So coming up with not only a portable genome that allows for easy access to health care providers and patients themselves, but also having centralized, I'm going to say database is just for not having another term, that allow us to continually look at genotype and phenotype markers and how they do relate to each other, even when they're very rare. It's not going to be feasible with one health care provider to do that. It really behooves us to now look at this more on a population scale, and how will we really understand what are the decision support mechanisms we can put in place based on a continuous knowledge base? The problem with the continuous knowledge base, as I see it, is the fact that we do not have standardized sharing of clinical data from health care systems into central repositories. We are not talking just about an EMR. So when we talk about outcomes, rarely are they captured discreetly. They're in physician notes and other types of documentation within an EMR or outside of the EMR, such as radiological systems. And so we're really looking at this as more of a holistic approach of how do we gather data, not just genomic data, but phenotype data, clinical data, associate that so that we can start actually making reasonable decisions about what the genetic variations we see within the genomes are actually doing from not just a disease perspective, but also from a health perspective. Thanks, so I hear a bit of an echo of the adaptability that it's sharing not only important to push out information, but also then to capture as Dan was talking about, what are the outcomes? And synthesize that knowledge across to everyone so that we can learn more rapidly. So thanks. Mark, I have you next. I think coming out of the commercial EHR world, that one of the considerations that we had to focus on is that an EHR is a legally binding medical record and that one of its roles is to reproduce the information available at the time that a decision was made in case there was a malpractice lawsuit and so forth. In this whole dynamic fluid state of knowledge, I think I get concerned when we talk about called variants versus expert interpreted results, which is more of the standard that the legally binding aspect has to fulfill. So I almost wonder if the 15th element of the desiderata should be that the framework has to fit into the, we have to fit into the current legal framework, which has that threshold of fulfilling some legal obligations. And so if a result was interpreted today, new knowledge is available tomorrow, but that result was not reinterpreted, how does the CDS know which version to adhere to? Legally it has to respond to the old state of interpretation. And I think in the big data world, veracity isn't an element of big data, but it's not one that we put enough attention to, but that data provenance as well as the context provenance, I think that's a really critical element for these discussions. Thinking about genomic data and use cases. I'm proposing a use case where individual information is shared between physicians in the context of care. For example, I have a colleague who asked me and I have a patient with an autoimmune disease mutation, gene mutation, she's in six months of pregnancy, what can expect in the next three months. And somebody else seen a patient with a similar mutation and what happened? What was the treatment? What was the outcome? And I've talked to a few of you who are compiling cancer mutation databases where certain tumor profiles were treated and physicians may be interested in sharing information about similar cases. So this is the case where there's an incentive for data sharing and genomic data is part of the picture. So in that context, there is issue of what is the data at the genomic profiling level that should be shared in order to inform treatment of a patient. And the second is how to represent the non-genomic parts of the data. And there, I would say that there are a lot of technologies emerging that can handle with the kind of messy, non-standard, semi-standard descriptions of disease phenotypes, conditions, and so on. I particularly remind simple knowledge organizations system, SKOS schema, which is emerging as standard on the World Wide Web as a method of sharing different concept schemas. For example, descriptions of conditions, diseases, and so on that can actually live and link, help us link data in this messy world and identify similar cases across different institutions. Great, I would add to that. Reflecting on what Mark had just said, also safe harbors for sharing, given some of the legal restrictions that we have about sharing certain types of data at the present time. Jamie, I think I had you next. Yeah, I just want to piggyback on one of the comments that you were saying about the importance of the CDS, not just going back to the providers, but to the patients as well. And so I just wanted to revisit really quickly, number 10 in the list of visitoradas, in that CDS knowledge must have the capacity to support multiple EHR platforms. I'd cross out EHR and start thinking about HIT platforms. I see Brandon shaking his head, so we may take that as a friendly amendment, Tida. Jim, Seminole. So in the old-fashioned, if we could call it old-fashioned, clinical decision support, it was the triggers for decision support were an event like writing an order or the arrival of a lab result. Now the triggers are going to be new knowledge, where the drug order has already been placed, patients on the drug and something new comes in and suggests that we should modify that order. So we're going to have to figure out how to handle this, you know, once every X months we get a new knowledge base, suddenly there's going to be 50 pages of alerts on every patient in the system. How are we going to operationalize that? So just wanted to comment a few things for the non-genomic part of standards. I understand we shouldn't and don't want to focus on it, but there are a lot of work going on, a lot of it coordinated by the ONC, really to meaningful use, EHR certification criteria, et cetera. So I just wanted to make sure that folks where there is a big body of work there and we should align in particular with what ONC is working on. And my observation is the community, it looks like including the vendor community is starting to call us around fire. That's the sense I get, so just to bring that up. With regard to the genomic data, I have a question from the, it seems like there's a tension between saying the genetic and the genomic raw data should be outside the EHR and maintained separately and the notion of it needs to be part of the legal record. I just wanted to bring that up. I just would like people's thoughts, like where should it be in the EHR or should it not? I'll take a stab. I think in the first desiderata paper, the model is that interpreted data should be in the derived data that's interpreted should be in the EHR. The vast uninterpreted where the clinical significance is not known or appreciated should remain external. So I think it's a hybrid thought. And then the push-pull is that as new interpretations are occurring and then if they're validated by an expert, then they can be pulled in or amended. So it's a dynamic hybrid model is what I advocate. Yeah, and I think that there's been some writing around that particular issue. I think a couple of times we've talked about, again in the context of alert fatigue, when we have new knowledge that there's gonna be a bunch of stuff that's gonna suddenly be filling people's inboxes, I think it's likely that the reality is that while there may be new knowledge that's attached to variants in an individual that's had a sequence done, there are probably a very tiny subset of those that would need immediate action, that there will be a number of things that might then be triggered when appropriate context come up, but certainly the work that less and others have done looking at current parsing of sequences for what we would consider to be truly actionable variants in genes that we understand, or at least think we understand reasonably well. It's a very tiny fraction of the people, maybe in the range of two to 5% depending on the number of genes and your filters. So I don't know that we should necessarily think of, genetic groups like this are great at the worst case scenario thing. We've done it particularly in the LC space. We dream up wonderful worst case scenarios and build to those, whereas that may not always be the most productive. Robert, I had you next. One of the things I'd like to do is maybe step back to something that Ken just mentioned, and that is the knowledge representation issue. The representation of knowledge is in fact, at least in my opinion, one type of data. I wanna be a little bit careful here because we have an upcoming session on knowledge management immediately following this. But if we focus the discussion just to knowledge representation, I'm wondering Ken and Blackford, you've both been very involved with this space. I was wondering if you could maybe comment for the group on where we are and where those gaps. Okay. So I'll comment on what I'm familiar with. So the area we've been working with has been with ONC and CMS, where we've had two what are known as standards and interoperability initiatives that are public private sponsored by the federal government to define standards for various things. In our case, for patient data models and knowledge representation, and also for interacting with clinical distance support. One was called Healthy Decisions focused on distance supporting, which include the patient data model, a representation for order sets, documentation templates, and event condition action alerts rules, so alerts reminders, and also distance support as a service. There's one currently that's funded by ONC and CMS that I'm coordinating called Clinical Quality Framework, which is then taking those standards and harmonizing them with standards for quality measurement that you'll see, for example, for the current meaningful use quality measures. So in that regard, the data model is currently, it looks like it's converging on the physical representation being fire with semi-detailed clinical models, with a UML representation underlying it that we are calling quick for quality improvement and clinical knowledge that we just balloted in HL7. So there seems to be fairly good consensus there. In terms of the knowledge representation, there's a HL7 CDS Knowledge Artifact Specification Standard, which is a draft standard right now, which allows for the representation of this kind of logic, has a expression language for expressing these logical criteria. And the expectation is, so those are the knowledge representations. There's also a standard called HL7 Distance Support Service Standard, which provides a SOAP and REST web service interface for interacting with these kind of knowledge bases. And a lot of these are actually in implemented systems, some are in commercial EHR systems, in the VA, et cetera. So I guess what I would say with those is it is directly in the scope of these items. The main thing is genomic medicine has not been included as a use case. I've thought about it, but it's the kind of thing where you need people who are really behind it to participate. And my observation is the genomics community has been pretty absent in these efforts. And I think that's an area where there could be engagement. I would only add a couple of thoughts. Excellent summary, kind of the current state. I think the standards are moving forward. But you also asked Bob kind of about what is the knowledge management process? On top of the standards and this collection of knowledge, we have to think through, how do we keep the provenance of the knowledge, the current state of the knowledge, be able to reproduce the state of knowledge at any time from a discoverability or legal point of view, and to synchronize and coordinate knowledge engineering that might be occurring at these different levels. So that there has to be some recognition of what low level variant representation looks like and connected to the intermediate pathophysiologic state and connect that to kind of the rule or expression that might come to the end user. And partners healthcare and Vanderbilt, I was fortunate enough to be involved with large knowledge management teams that were focusing on this very problem and they take things like the FDB or new knowledge from guidelines or subject matter experts and then codify that in one of these types of architectures that Ken's described. The challenge actually there is that it's, one can do this in a number of different ways, but the expertise associated with actually abstracting evidence from either guidelines or evidence based data repositories is not well spread across the country. We did an estimate of how much it would cost to actually do the knowledge engineering for simple ambulatory clinical decision support. In this context, it appears simple. And that alone is about 25 billion if each and every clinic has to rediscover the same hemoglobin A1C alert rule. So the idea of centralization here is also very important, I think to put on the table, to get the synergies of scale. And that's where this, you know, a knowledge repository with these kinds of processes based upon those kinds of standards that Ken's elucidated would be the goal, from my opinion. So there's another component of genetic, genomic knowledge support that I think we haven't talked about a lot and it's something that we've experienced in supporting our gene insight clinic system for four years now that delivers knowledge to physicians based on new genetic variant interpretation. And one of the things we've encountered is that when you're dealing with genomic data that lasts a patient's lifetime, it's a whole different timeline than the physician encounter with a patient. And so the ordering physician, in many cases, is not the person you need to alert on. And that may be because it was a passing through resident at the time, or it may be because you've done a genomic analysis and you're delivering breast cancer risk to a cardiologist that ordered the test for cardiomyopathy diagnosis. And then you have the issue that physicians don't wanna be alerted to information when they're no longer caring for the patient because they saw them on a consult basis. And then we had another variable doing this from somatic cancer. I'm not gonna alert two years later on a variant that was found in a tumor where that tumor is no longer in that same genetic state and likely the patient is deceased. So thinking about how we may not necessarily think always about pushing alerts, but instead making new knowledge available so that any clinician going into care for a given patient can access whatever knowledge is now available, but not necessarily think that we're gonna push it to an ordering physician and that it will always be relevant to that patient given the somatic scenario especially. Yeah, the subscription model, I think, is really viable for the clinical workflow because that way you're not inundating the clinician with alerts and stuff, but two, it also tackles the whole storage issue where that whole genome does not belong inside the EMR, that fit. But the things that are relevant given to either it's a long-term problem, so germline, lives like your blood type, top line's not gonna change, versus something that's more somatic that kind of rolls on, rolls off the flow sheet based on the problem list, so. Yeah, so actually Heidi's point, I think it's strikingly a feature of genomic data and that is that unlike clinical decision support which is kind of owned and operated by healthcare organizations, this is a feature of yourself, right, that persists and so understanding, well who, if something does become actionable because we learn about a particular form of molecular variation and it's downstream and we don't know what healthcare organization you're currently affiliated with, don't we still have a kind of ethical obligation to not simply ignore the data because we don't know which organization is gonna act on it and so I think some kind of model that anticipates maybe that the steward, the final steward is the individual or their family or their designate or something, in addition to the traditional view that the hospital owns certain classes of data has gotta be an important part of this model of how you provide decision support over time when you expect that some things may become actionable that are not actionable now. I think this is a really important concept and we had it listed on our things is the role of the patient family in this because I think as Heidi's pointed out and Dan has emphasized at the present time, at least in our system, the only consistent agent is in fact the patient or in some cases the family and so there has to be some recognition that solutions that operate at a system level and moving data around at least in our currently disintegrated system may be decades away even under optimal circumstances. So one of the things that as we get to the implementation piece would be to say would there be solutions that would actively engage the patient family caregiver unit that's central to all of our systematic thinking related to delivery of healthcare services where we could leverage the knowledge at that point. Raises a whole different set of issues but I think it's at least something to put on the table. So I just wanna add to what Heidi said. I think it's an important, when we're talking about data types is to think of traditionally, we don't think of the time aspect as an important element of it. So even in knowledge representation, we have to now start thinking about the time aspect and the evolution of symptoms as it applies to a genetic disease that when we look at a patient, we're just looking at a particular point in time for that patient and that while the patient may have a specific mutation, it may not have expressed at that point in time but could express later in life. Our EMR isn't structured to give us that view either. It doesn't give us the whole view, right? It kind of is episodic in terms of when we come in, we look at the episode, we're dealing with something, an episode and we sometimes can miss the point that it's part of a bigger picture and it's part of the evolution towards the endpoint. So I just wanna say that when we consider the data types, right? I think we've gotta start thinking about the time aspect as a really important element of the knowledge representation itself as well as the way we structured our EHR. I just wanna jump in since it's coming up here that the value of the genome actually is even longer than that. It applies to the children and the grandchildren and on. So if we're gonna bring up time, just bear that in mind. I just wanted to share some aspects of our experience with the system that Heidi was talking about that may be relevant to some of the conversations that have occurred here. So this is a system that pushes knowledge updates to clinicians when something, when our laboratory records reclassifies a variant that was previously found in their patients. And what we found, the system, it pushes these knowledge updates in about 3.9% of cases per year as this knowledge evolves. And we have studied this over the last four years and we found that while there is definite concern about patients moving and fear of being overwhelmed with alerts, we do find that clinicians overwhelmingly in the germline space very much appreciate these updates and want to receive them. In terms of dealing with the issue of documenting what's happening, the way that that works in this system is there actually is a transaction. There's a knowledge event. It's not signed up by a geneticist because of the concern that that would be overwhelming, but there's a transaction that goes into the system that's used by the clinician. It's recorded when that knowledge update occurred. And we also, if it's a significant update, send reminders to them that they have to acknowledge that this change has occurred to try to manage that part of the documentation process. I'm fascinated by that comment because what I was gonna raise is related to it. Harkins back to something Liz said earlier. I'm interested to know in CDS, non-genomic CDS, how is the nevermind done and pushed out? That is to say this wonderful decision support we provided to you for your patient a year and a half ago that told you to do these things. Now we don't think that really matters anymore. My impression is that that is generally unappreciated by clinicians and I again wonder about precedence, because we're gonna deal with that a lot unless we restrict ourselves to a tiny, tiny, tiny subset of things that we're nearly certain about. And so if we're gonna get into that, what's the experience in CDS of undoing prior decision supports assertions that were made? Others can help as the case may be. You know, I think the truth of it is today is that most of the clinically oriented clinical decision support in practice is very cross-sectional. It's at a point in time. It has not really evolved in a sophisticated way to be a longitudinal sort of model or stateful model of decision support. So what happens is if there's an update to the knowledge base, for example, a new drug-drug alert or drug symptom allergy condition interaction, simply a new alert is provided. And the physician, the expectation is then from a provider's point of view, that was then this is now. So at the risk of being turning it completely over to the Vanderbilt folks, I'm gonna go with Josh to talk to this point and then Dan. So I was just gonna add that we've run across this problem operationally because we release new drug gene interactions in an environment where a number of people are already on the drug and then we also remap our variants to a new interpretation. And so we've handled this a variety of ways. Sometimes it's manually, which of course is very labor intensive. We take advantage of our clinical messaging system. So we obviously need good systematic ways to handle that scenario that you describe. I think that when the information is sufficiently important, it is appreciated by providers. In our case, there was a number of patients on high dose infestatin and we were starting to release SLC-01B1 and we successfully managed that transition by contacting providers and patients sometimes directly. I wanted to ask a completely boneheaded question because I'm not a CDS guy and I'm about to make that really clear. I wanna know what discussion it is we're having here. And by that I mean, the CDS that Josh describes is all about interacting with physicians who are prescribing drugs and it's an ongoing dynamic process. The process that Heidi is describing is a lab report. It's a genetic test result that is delivered once to a chart and it has to be interpreted, it has to be readable, it has to be transportable, it has to be accessible to the family, but it's not a dynamic process in the same sense as we're changing drugs all the time. So I'm not even sure that GCDS applies to the cardiomyopathy patient or the deaf patient or the cancer susceptibility patient. That's a lab report that somebody has ordered and some physician needs to interpret along with the lab and the interpretation may change, but it's not conceptually the same thing as ordering clopidogrel or suddenly delivering SLCO 1B1 data and having physicians having to sort of on the fly figure out what's going on. So that's, I wanna know what CDS, what we mean by CDS in this context. So Heidi, I have you in the queue so why don't you respond to that initially? Yeah, so I think you're right Dan, there is some differences, but at the same time, so for example, when we issue a genetic report that's on a somatic tumor that says this patient has a variant that will make them sensitive to tyrosine kinase inhibitors and that leads to the patient being put on that drug, but then the next week a report comes out and says, no, no, no, that variant is not making that you were wrong. We then issue an update and say, got it wrong, this variant is actually not and that the patient might be taken off that drug. So there is some similarity. That example is just like the Clopidogal example. I concede makes it sound like we're arguing and I don't think we're arguing, but when you sort of deliver a genetic test result, it's, and I'm not trying to sort of devalue the effort that goes into those results because I know, as you know, how hard those interpretations are, but it's conceptually a little bit different. Well the other scenario and this gets back to Les's question about when do you take away alerts, essentially? So I view it more in the way Blackford said, it's like a new alert and one thing we do is if we've reported a variant is likely pathogenic, but then we later discover it's benign. In some ways that's like saying, this is no longer relevant, right? And so we deliver those, this variant is now classified as benign and that often completely changes how the physician cares for the family because instead of all the family members coming in for constant monitoring of their risk for that variant that's now been changed to benign, they can stop doing that. So it can be fairly dynamic depending on the actual utility, the genetic information in those reports. I certainly see a lot of the same principles operating in both of those use cases. The time element is obviously different but I think there are similar themes, at least from my perception. Alex, I had you next. Well, speaking to the theme of time, as Jim pointed out, genomic information is unique in that last human lifetime, maybe relevant for generations, somatic mutations in tumors, perhaps not, but there are layers of other omic information like epigenomic, liquid biopsies and so on that may also have a shorter time span in terms of value of the information for the care of the patient. But that information may be valuable for a knowledge discovery. So it may still need to be retained for that other purpose. So these two purposes of knowledge discovery, value of that relatively transient information for knowledge discovery versus information reliance for patient care need to be kind of decoupled and considered separately. Yeah, I wanted to make a comment not related to the last exchange but just a plug for the use of other types of data to optimize genomic CDS. And the area that I'm thinking about is risk prediction, susceptibility to disease where the models will likely have an element of genomics but also need to pull data from other areas of the EMR that are part of those risk prediction models and actually may need to even pull data directly from the patient, such as family history. So I think a lot of this discussion is focused on the elements of genomic variant data and how that is pushed to the clinician and a decision report, clinical decision support rules are derived for them. But there are clearly gonna be situations where many other data elements need to be pulled in to optimize that paradigm. Yeah, I think that's absolutely correct. And to some degree, I think some of the discussion that Ken was relating earlier is that, while it may not be within the purview of this specific group or this specific discussion to consider all other data elements, the reality is that to be able to contextualize, we have to be able to combine disparate elements that the genomic information is not going to be determinative in the vast majority of cases, probably even with highly penetrant single variants. In genes like BRCA, they're not gonna be solely determinative, there are gonna be other factors. I can say for the purposes of discussion, I think this is true in our definition of genomic medicine that we generally use is that family history is inclusive as part of genomic medicine. I think that family history data suffer from many of the same issues that we've talked about with genomic data in terms of the lack of promulgated standards and these sorts of things. So in some sense, I would be inclusive of family history data within the context of the genomic data discussion. But what that opens up is an element of where do you get the data from and if the patient now is now an active component of data that's important for clinical decision making, that's a whole area of standardization and issues that we have to grapple with as well. Right, and so that is something that we really haven't touched on is, well, we have touched on it to some degree, I think in terms of at least in the broader context of where do we direct the clinical decision support rule to draw data from, that's a data sourcing issue. And we talked about that in terms of reliability and validity attaching that type of information to the data element per se. And so I think that if we think about that in a broader context, then patient under data like family history or other things that we're increasingly relying on in clinical care could also be annotated with, we need to point the CDS here, but this is a patient source data and here's what we know about the reliability of that particular data element. So yeah, I think those are very good points. I had Casey next. So I guess along the same lines of trying to pull together data from multiple sources, with genomic information and genetic conditions, a lot of times we're dealing with rare conditions and so being able to leverage some of these ontologies that relate phenotypes and conditions with important genes will be more useful. So when it comes to documentation, how do we leverage that to identify patients who should be flagged as having decision support as well as identifying patients who should be aggregated for discovery-based questions as well. So to address the issue of the difference between a laboratory report and the pharmacogenetic decision in the CSER EMR working group, we've been discussing this for several months and we've come to the conclusion that it's not a very surprising conclusion. There are many different forms of genetic information and use scenarios and the decision support rules that would fire off the different scenarios are very different and that doesn't mean that there's still underlying issues about structuring of the data and reporting the data even when the decision support rules may be very, very different and sometimes it can be confusing when one person talks about something that may be really only relevant to pharmacogenetics and another common happens that may be really only relevant to germline mutations and another one may be available, but there are, to reiterate what Mark said, there are commonalities between all of these and so just for those who aren't aware of the CDS kind of realm that the differences between these can be addressed by having different firing rules. Clearly you would have somebody getting tested for prenatal, there may be some genetics that are only applicable to prenatal care where the only time that a decision support rule would flag would be when the person comes in to their OB-GYN for a prenatal or a first pregnancy visit where they say, oh, this person is a carrier of statistic fibrosis mutation, has their partner been screened and would never, ever fire at any other time because it's really irrelevant to their clinical care for every other situation. The same thing may be true for breast cancer risk, whether you're talking about overall risk score or a specific variant where a decision support rule may be developed to say that this rule will only flag either when a physician requests the breast cancer risk to be reevaluated or when there's a dramatic change in the apparent risk of that individual because of some reclassification event and maybe even only then if somebody has previously queried for that risk. So there are many types of decision support rules that can fire off of many different types of genetic information and just for future discussion, sometimes it can be useful to clarify what type of genetic information you're talking about, whether it's pharmacogenetic information or whether it's cancer genetics because some issues only apply to one or the other. Yeah, and I think that the other thing I would note is that we're, again, not unexpectedly, we're sort of straying into knowledge representation around variants and an example of this would be CYP-2C9-STAR3 where we've traditionally thought about this from the perspective of how does this influence warfarin dosing, but with a recent publication out of Taiwan that says, hey, wait a second, this may be a risk factor for adverse, severe cutaneous adverse events if a patient's exposed to phenitone. Well, the variant representation doesn't change, but the knowledge around that has to be contextualized in terms of am I doing warfarin or am I doing phenitone? And we're gonna have examples where both of those, of course, are pharmacogenomic, but we're going to have examples where a given variant will influence pharmacogenomics but could also impact risk or other things. An example would be perhaps some of the variants in RYR1 where you have, there are specific pharmacogenomic implications of that, but it could also be for a patient that presents in the ER with what may be heat exhaustion, say, hey, this person's at risk for going into a full-blown malignant hyperthermia, which is a very different clinical context. So again, this is the tension between, we can represent the variant and we need to represent it, but then we need to attach the knowledge to that as well. I think I had Ken next. Just more of a question, and I see Clems here, so specifically want to ask this. It seems like there's a, in the typical decent sport realm in terms of these kind of results, you think of it as OBR, OBX, you know, a grouper with individual nested elements with link as the agreed upon approach to identifying what it is you're asking the question about and the answer can be various things and if it's coded, typically might be SNOMED, for example. I'm wondering, does it seem like we can use that same paradigm in genomics or does the paradigm break when we get here? Well, we get now almost routinely requests from the kit makers for link codes, but there's some challenges that lower down, you know, like the kit makers and I'm totally, we gotta have, you know, RefSeqs with these and they don't usually say anything about RefSeq. I mean, sort of some, and then there's a lot of variability in how they conceptualize alleles, you know, have squished two allele reports in one field or they have two separate fields. FDA could help a lot on that side of it if they would just sort of give some guidance on it. So in those things, I think we can and then when you get to the JIGO reports, you know, where you're doing the whole sequence of the whole genome or a lot of the genome, it gets more complicated. There is a structure, HL7 is proposed, which uses link codes, which lets you repeat a whole bunch of fields, you know, you do repeating loops and you can say all the things you wanna say and there's actually a lot of activity in HL7 in this space, so I'm not sure how to all fall out, but right now it's mostly narrative reports is what you're getting for the real complicated one, which is useless and I think we'd go a little bit if we just said, okay, if you got a mutation analysis, first you always record refs week. You should always report mutations and quotes or big ones you found. You should probably do it more than one way. HCVS, as best it can be. I just come to understand ClinVirus, probably the ideal code where it's available and the SNPID, I mean, you'd have three things for all of them, at least, because it's evolving and then always report the ones you look for if it's probe-based or report the area you looked at if it's not, but we're not even close in that area right now but one other thing, there's a lot of things that aren't reporting the genetic mutations and we brought up the prenatal stuff and there are now four ways to test for bad things and babies on a mother's plasma. Looking at DNA in the mother's plasma, it's wild and it's dazzling actually and it's like 99% accuracy. They're reporting probabilities and some are Bayesian. They actually say they're Bayesian so there's another layer that, and then they do it a lot of different ways, one to 2,000 and some little bit of standardization go a long way but we gotta get them to line up. I was just gonna mention that we've been using the HL7 clinical genomic standard to encode some of our drug gene interaction results and we appreciate the work that's gone into it and it's successful for a lot of the drug gene interactions we report but there's a lot of work that needs to be done to represent the remainder or a large variety of genomic scenarios so including the one you mentioned where there's the CYP2C9, star three has multiple overall interpretations depending on the context so I think that should be a focus going forward. We're properly discussing standardization of the rules and back end knowledge databases. I guess one question I have is especially given the importance of clinician workflow, alert fatigue that it's sort of floundering a little bit because of them feeling overwhelmed. Should we seek greater standardization or at least recommendations on the actions of genomic CDS and by that I mean perhaps there's interruptive, non-interruptive, there's user-based color schemes, there's certain points in the workflow with certain triggers and I don't see too much standardization in that department. Yeah and I think that's a good point and we may wanna tee that up particularly in the implementation session this afternoon since that I think really gets into that the whole user interface, user interaction so if it's all right to maybe put that on the parking lot for a bit, that'd be great. Alex. Speaking about HGVS identifiers, it's very non-trivial to map them from the literature because the gene names are missing and that's actually a case of a wider issue where genomic data is indexed by the genome. So whether we're talking about amino acid sequences of transcripts or genomic DNA itself, genome and reference assemblies serve now as an index for information so it's important to acknowledge that utility of the genome and then to facilitate data integration by providing means to relate any molecular information to genomic coordinates. At least currently in the queue, I'm gonna put Chris' chute on the spot because I'm stunned that with the word standards being flung around as frequently as it has been that Chris hasn't immediately started drooling and foaming at the mouth so Chris, if you wanna weigh in on maybe even just a current status and how much you think the standards world is relevant to what we're trying to do here. Fair enough, I actually was heartened to hear our colleague from ONC talk about the semi and fire initiatives for the clinical information. I certainly endorse and concur that that does seem to be the mainstream. With respect to genomic standards, we are yet in co-eight, I think, at least with respect to clinical implementation. And as Bob Freimeth has basically instructed me in the past several months, our notation and nomenclature for genomic variants is how do we phrase this immature, unreliable and not sustainable as a clinically it deployable or implementable reference point. I mean, clearly things like the star allele infrastructure are collapsing under their own weight and it seems self-evident that energy and resources as pertinent to this genomic problem and decision support should be invested in the question of how the heck do we maintain a national slash international reference representation of genomic variation that can in turn be embedded in redistributable logic specifications for CDS and others and more pertinently can be reliably and reproducibly integrated into clinical systems. So I just wanted to follow up on that comment. This question of how do we unambiguously define a variant, it's a really important one and for a while I was focused on sort of defining genomic reference sequence and we could define it by the genomic coordinates but as much as that seems simple and straightforward those of us who work in that space know it's not and in working with the ClinVar team at NCBI for a while now they have been behind the scenes assigning a variant ID, Jim can speak to this, to every variant and sometimes that's a collection of a haplotype almost like in the star allele world. These two together define this star allele that is clinically relevant so we've now gotten to the point where the only place you see that right now was in the URL in ClinVar and asking them to put that front and center because I think that that might be a paradigm to actually use in clinical decision support that we truly define what we mean as a clinically relevant variant that can have a lot of variability unfortunately with a single ID that can be transmitted and used within healthcare IT systems and I'll just put that out there because we've been talking about it extensively within the ClinGen project as an idea. Yeah, that's a good point, thanks Heidi. That brings up a number of issues that currently aren't resolved as you know and there's debate in the community about the degree to which pre-coordinated or post-coordinated systems should be used to represent this sort of information and I think that's probably getting a little bit deeper than we need to right now so I'll just leave it there. What I would like to do is since we're starting to run a little bit low on time make sure that we touch on the last two questions that were assigned to this session. I think we've hit the third one a little bit in this present discussion and we touched on the previous one that is question two in our last discussion but I want to loop back to it for just a few minutes. Question two talks about the massive nature of genetic data and how that might influence development and the implementation of genomic CDS. Again, I want to remind everyone that there is an upcoming session. Session number three is going to focus on implementation so we don't need to necessarily hit all those points right now but if we could talk about the data issues that are relevant to this I think that would be helpful. One of the things that I wanted to point out is if you reference your sheet, Desert Rata item number two talks about lossless compression. That is certainly a factor here that we could consider. Another thing that has come up that I'd like to call out for possible as a possible seed for discussion here is whether we should be designing our systems for the ideal state rather than the current technological or architectural limitations that we currently see within the systems we have and that would be looking at do we, for example, taking Desert Rata number two as the example here, do we subset our data by policy because we think clinically that is the right thing to do or do we do it for performance reasons? To comment on the, do we go for the perfect state versus what would fit into the current environment? This is something that we all run into. A lot of the things we're working on behalf of ONC and CMS, it comes up all the time. Do we, for example, if it's a quality measure addition support, do we model a data element the way we think it should be in the EHR or do we model it the way it is in the EHR because there's significant pain when you say, well, we'll talk about data in a way that's not actually in the EHR but we think EHR should represent? I think it's a similar kind of notion. I think it should be done very, very carefully and only if the benefits clearly outweigh the costs because bottom line, perfect is great but we first need to get to good. Yeah, so I really think for this question we do have to go back to the concept of do we separate the clinical decision support from the research itself? For subsetting data for clinical decision support, I actually see that that could be something that we could do and it would actually be a benefit to healthcare providers but from the research perspective and trying to find a new variance that are actually involved with disease, we have to have the full data sets and so to me, you really have to separate out those two use cases. I brought that term in and really make decisions about how we store and use data and part of that, and this came up for those of you who were at the AMIA policy meeting a couple of weeks ago, we had a fairly extensive discussion about this but we are really talking about big data analytics and so how we wanna use data from a research perspective versus how we wanna use data from a clinical perspective is really now starting to separate. When HIPAA was written, we didn't have Facebook, we didn't have Twitter and we weren't trying to figure out how to use data analytics to start doing discovery. Today we have the human genome and those other types of things and so it really I think is going to be important for us to have that separation so that we can maybe get around or modify some of the HIPAA and high tech rules that allow us to actually do analysis to really focus on what is clinically actionable and therefore what should be involved in the clinical decision support. As we both thought about this and wrote about it, the idea of a better, more perfect knowledge representation standard actually attaches nicely to the not yet extant public library, right? So since that thing doesn't exist, what the books on the shelf would look like also doesn't exist and so you have a kind of clean sheet of paper for having a more full featured knowledge representation with the expectation that every organization is gonna have to have some kind of ETL transform to take that thing into what really works in their own environment. So rather than shooting for an operational common denominator that's quite heterogeneous and quite low, you actually could have a more full featured but independent and naturally independent because it's hosted by another organization on behalf of the entire community and the future expectation of the use of the data. So in a sense, you can dodge the question by virtue of the clean sheet of paper that's offered for how you would, where it would live and how you would maintain it. That sounds like a very governmental approach of kicking the can down the road. I like that, Jim. So I was just wanting to speak to this sort of massive nature question and some of the comments that were just made about big data analytics that, and that so far, most of our discussion has really been about germline variants and obviously somatics listed here, that's much more fluid, many more times of sampling, it's actually more data. We haven't brought up RNA-seq, we haven't brought up epigenomics and we haven't brought up the microbiome, all of which are basically approached by similar technologies and all of which we see massive scale up in terms of sort of agnostic, high throughput, whole sample types of data and I would throw out for context in case you're not aware, we're involved in a number of projects with CDC and FDA to use whole genome sequencing for pathogen surveillance where FDA is sequencing the shipping containers and the restaurant salad bars. CDC is sequencing clinical samples that come to them as part of reporting and we've already, there's already, in fact it's so powerful that already there have been regulatory actions based on the pilot, which isn't even standardized yet and in addition to greater sensitivity, what they're seeing is the power of historical samples that they've found, for example, clinical isolates which are matching historical samples from food processing plants which are suggesting that food processing plant wasn't totally cleaned up, it wasn't cleared out and now it's back and so if we're expecting, so that's a pooling question which is even bigger than what hospitals do because say the hospital does the microbiome, you'd still wanna pool it with FDA and CDC and global surveillance and there will be healthcare decisions from that. So I just throw that out to sort of pull us back up to the sort of unique issues about this which is scale. I assert that none of that stuff is gonna be in the electronic medical record. All of it should be pooled, all of it should be persistent and all of it should be reprocessed periodically as new algorithms and approaches to data discovery are found and each time you do the reprocessing of the whole thing, you will find stuff you didn't see before that somebody might wanna know about. So just throwing that out for your consideration for question two. So one of the things that occurs in that discussion, I had flagged it in some of the, I think it was when we were in the first session with the idea of that there seems to be, there seem to be public health aspects of what it is that we're talking about here and to some degree, at least a potential output and I don't mean to jump to day two here but at least in terms of flagging it is what is the role of the traditional delivery system which is engaged in the public's health versus the public health system which we tend to compartmentalize into while they do disease surveillance and they do immunizations and this sort of thing. I think we all recognize that everybody's in when it comes to public health but we've not really been systematic about how we do that and would this be an opportunity for study about how to do that? I think, Clem, did you have something that you wanted to interject at this point? I just was talking to my boss out in the hallway so I didn't hear the discussion but I'm sure I would. Okay, well let's you get caught up and then we'll get you back in. So Brian, I think. So when you're talking about, we're talking about kind of a push and a pull mechanism where you have the underlying data that in some as yet undefined box that perhaps does not live in the EHR because the EHR can't handle that data at least right now and maybe not even the clinicians don't wanna handle that data. And with the push mechanism then somebody orders it or somebody decides at one level that this data needs to get put in the EHR get put in a level where this clinical decision support is firing off of it. We talk about a pull mechanism where the decision support rule is saying there's something out there that, there's a piece of information that potentially no one's asked for but might be necessary or might be useful and that can be very powerful but also we're moving into the, there you're moving into the realm of screening. I mean moving into the realm of public health like Mark mentioned and I just want to submit that the, one of the reasons for not necessarily wanting to go there except in very specific instances is that usually for screening there needs to be a much higher level of evidence. And the reason why is because once you enter into screening then any incidental finding and any adverse event is iatrogenic. And so I think that there are, I love Dan Macy's Desideradov saying that it's sort of, that level of information needs to live separately in the EHR for many reasons, for not just technical reasons but I think there's also health reasons why that level of information needs to be separate and there should be concerns about being able, having automatic systems that query this underlying information beyond whether it's validated, beyond whether it's, even if it is validated, even if it is robust then the level of evidence that needs to be there for doing this querying of the information to have unrequested results, unrequested alerts be fired, unrequested information be queried, really needs to be much, much higher. It's nice to hear an informaticist that's conversant with Bayes Theorem because it seems like a lot of my genetics colleagues tend to forget that very point which is extremely well taken, Les. I hear what you're saying but I don't actually see that that's what we're currently doing in areas outside of genomics in medicine in that we have different standards for deliberate prospective screening than we do for incidentally acquired information and I think that chest x-ray nodule is the best example of that. Everyone agrees that it's inappropriate to screen the population with chest x-rays to find such nodules because we know that they yield to that as terrible and yet it is uniformly as far as I'm aware practice across the entire field of radiology that when an x-ray is done for another reason if such a lesion is identified it is reported. So I think that's, and we do the same thing with the physical exam. We find skin bumps all the time and we deal with those and we don't set out to do population wide screening for little skin bumps. So I think we have to acknowledge that when you have the information you have different obligations than when you set out prospectively to acquire. That's exactly what I'm talking about and so saying that we should not develop a system that does MRI, that does the genetic equivalent of MRIs on everyone for that exact reason and we don't want to develop a system that identifies every possible incidental finding on every individual but we need to be very aware of when we're looking for those and when we're not. Yeah and I think that this is an important issue but one that there's a potential for us in this discussion to get potentially bogged down and distracted in the sense that I think we should take the approach much as CPIC guidelines do which is to say if we have the information for whatever reason and we want to use it then how can we leverage clinical decision support to actually do that so that we can kind of focus our discussion around that aspect and not address what is clearly a very important concept but perhaps for the purposes of this meeting a bit out of bounds. Well I'm just addressing that if that's some of this genetic information which we might have we might not want, we might need to pretend that we don't have or that we might need to have it but only should it be queried as opposed to saying we have it and we've analyzed it and we are obliged to report on it. Right and those of us that are involved in these discussions as Cesar and Emerge and God knows there's been a lot of ink spilled, the equivalent of blood spilled on that very that contentious battlefield and so I just assumed not recreate the battle of Hastings at this meeting and let my ELSI colleagues do that but it is an important point not to forget and I think as that becomes adjudicated or perhaps a standard emerges assuming that that's the case then that will be relevant to inform how CDS is applied. Robert, Jim? So one of the things that I've noticed throughout the morning's discussion here is that although we've touched on it a few different times there's one topic that has not come up explicitly or at least as explicitly as I was thinking it might and that's the issue around provenance. So we've talked quite a bit about how rapidly knowledge can change and the impact that that has on the interpretation and use of genomic data but I'm wondering if we have, we've got maybe 10 minutes if we're lucky here maybe a little bit less. I'm wondering if we could have a brief discussion on what sort of provenance might be needed and I guess metadata could be thrown into that around not just the genomic data but the methods used to produce it, to analyze it, et cetera. Yeah, I think right now that's critical. The different technologies, different algorithms the way you set up the algorithms as far as the way the parameters are set makes reproducibility a huge issue. We've seen this of course in the microarray world where things are not reproducible generally but a lot of that is because we don't have the rich metadata that we really need to reproduce those experiments. So I think it's absolutely critical that we set up a standardized metadata structure that not only allows us to capture what platforms of the experiments were being done on but also exactly what the parameters of the algorithms were and what the algorithms were down to the version levels so that we can basically, any healthcare provider can reproduce that analysis if they want to. I think it's just critical. And I would say that a number of the diagnostic entities that are doing this clinically, obviously we do register and store that data down to that very fine grained level and many people are working on addressing standards in that area. Of course, one of the problems is even if you know exactly what pipeline and which tools and which versions and which reference data someone used the next time you go in and sample that same patient, you get a different sample of the possible DNA from that population. You're not necessarily guaranteed to get the same result. Along that line, I was at there's a meeting last week and the FDA hosted and there was a session and a genome in a bottle and it's a group that's trying to really be nailed down the precision. And I was kind of shocked. They were reporting 10, 5, 10% differences across platforms because when you got a billion samples, you don't have to have a, you can have a teeny error rate and still get a lot of funny stuff. Yeah, I think the paper from the Stanford group where they looked at same patients and looked at two different sequencing platforms and then even re-sequencing on the same platform. It's those types of things that should make us all very circumspect about the enthusiasm which we move this into the clinic. But we also have to recognize the fact that there are many of us that are not daunted by these sorts of things that the push is frequent with new technology in this country is that we move it out into the clinical space for a variety of reasons, be they intellectual, economic or whatever. So in some ways, I think this is a critical issue to address is that we're giving you information, we're giving you instructions about the information, but how confident should you be? And particularly when we're talking about interventions that can be life-changing relating to prophylactic surgery and things of this nature, I think that's why a number of us are saying we have to do additional steps of validation both on the laboratory side to be certain that what we're calling as a variant is truly there, but also to try and contextualize that using as much other information as we can before we actually act on a clinical decision. And I think only a subset of those activities can realistically be put into a clinical decision support system other than to say, hey, before you start to make decisions, you might wanna consider these different things to do. Alex. Managing information about validity or whether the date is validated or not may be an important aspect. So validated data may be required for clinical decision making, but non-validated data may still be useful for research purposes. And that may argue that we may need to both record a level of reliability, whether the date is validated, but also to keep also the data that's not validated. Perhaps that information is not fit for electronic health records, but may be fit for another repository or genomic information that accompanies it. You know, and that's a really interesting point and Terry and Eric might want to weigh in on this because we've certainly in a number of contexts within HGRI have talked about the idea of how can we utilize information out of clinical transactions and how can we close the loop so that we can not only push things out in the clinical environment, but capture information from that. And how do we then appropriately represent this as, clinically valid versus more discovery can be problematic. But it seems like an area where we've had some discussions in other contexts about this and I'd be interested just in your perspective in terms of where you think genome is currently at from that. Yeah, I think you've described well where we are, which is we're wondering how best to do this. I think we have a few programs ignite and emerge being probably the main ones that will allow us to capture some of that. At least we hope so in their incarnations. But we don't do this well and need to find a better way. Thanks. A little unrelated, but it'll be answering questions. We're trying to define a theoretical ideal state and that'll be terribly valuable in a contribution. Regarding the goals of the conference, are we also trying to identify those open complex issues that would benefit from further research, study, pilot tests and for this topic at hand, are there research or targeted pilot tests that would be useful to determining the optimal forms of genomic data types or CDS? Yeah, so I think your question is that, yes, we would like to identify potential targets of opportunity, whether it be for definitional aspects or pilot projects or those types of things. So hopefully in the synthesis from this session and also from the broader overall meeting, we would love to come away with some things that we say, here are some really good ideas that could be potentially studied systematically. Yeah, Jamie. No, no, no, I'll just make this really brief. It's a question that maybe we can talk about during the next session when we're talking about implementation, but we talked about the idea of metadata and when we brought it up, we immediately went to the front end of knowledge generation in terms of the validity and quality of data. This is just a harebrained scheme but and I don't know if this has been a discussion anywhere else, but the idea, I want to introduce the idea of using metadata around CDS rules to match with what the physician or what is being looked at on the EHR so that that might address some of the alert fatigue and so I'll just leave it at that and maybe we can talk about it the next session. Yeah, I think you're exactly right. I think there needs to be a lot more thought put into what sort of information is captured in a time-dependent fashion as knowledge is changes in CDS rules are written and revised. So knowing that we are now at the end of our allotted time for the discussion section here, we've got a few minutes here for what's been titled Wrap Up in Summary which is going to be fairly informal. I've tried the best I can to take notes during the session here. I'll go through a few of the things that I've highlighted and then I invite my co-moderators here to pick up as well but knowing that we now stand between all of you and lunch we will try to keep this succinct. So some of the themes that I heard coming out of the discussion over the last hour and a half or so are as follows. First of all, we recognize there are different types of decision support. Not everything is an active interrupt of alert. There is also more of a passive mechanism that might be more akin to a traditional lab report and that these different types of CDS may not require different types of data or different representations of data but may simply operate on the same set of data in different ways. There was a great phrase that I wrote down because I thought I captured the essence perfectly. What is the CDS engine fire off of? That's at the core of what we're trying to discuss here. We talked about the knowledge that's needed and that it will potentially vary by the different CDS rules. This includes the very basic information about the genomic variants that are found, the sites that are interrogated and how that information is interpreted or assessed. And that we are currently missing. We as a community are missing standard interpretations that can be fed into these CDS engines and that in some cases, in fact, in the future, it may be that in many cases, the triggers are not going to be traditional drug orders as we think of in the pharmacogenomic communities but actually the triggers will become new knowledge. That new knowledge, we said, would most likely be made available on demand as a pull mechanism rather than as through a push model to avoid overloading. I think that's an important point to keep in mind because it turns on its head some of what may be considered traditional practice in this way. We have a quality issue. We need to keep in mind that not all calls and interpretations are at the same level of confidence nor is that same level of rigor needed in all clinical contexts. That's something else that is potentially new to this space. In addition, patients are mobile. We need to be able to maintain their data over time and share it among providers as those patients move around. Questions are related to creating centralized repositories for knowledge on a population scale. This is something that most certainly would not be feasible for each care facility to do on their own. There are also legal obligations that we need to keep in mind, especially with respect to reinterpretations of genomic data and what impacts that has for follow-up with patients. There was recognition, of course, that genomic data and the interpretations may be valid not only for a patient's lifetime, which is by itself an interesting aspect to genomic data compared to some other types of lab measurements, but also that this data may have relevance to immediate family members and descendants. And so there are some unique aspects around maintaining this data and the subsequent interpretations for potentially longer than the patient is alive. We recognize that data analytics have advanced significantly that we need to have a way to analyze the data in context of or perhaps in spite of HIPAA. We have different types of genomes, of course. There is the germline somatic microbiome we're all mentioned. There are other aspects to this that simply go beyond what SNP do you have in a particular gene. We talked about the implication towards public health, looking specifically at surveillance and immunizations. We talked a bit about incidental findings and how genomics may or may not be different from other screening mechanisms. And finally, the last bit that I've pulled out here is related to provenance. The fact that it is critical for reproducibility and what I've just now dubbed the three P's, platforms, pipelines, and parameters. So with that, that's the end of my highlights I invite my co-moderators to add. I think that was a very good summary. I would only add that I think in addition to the sort of aspect of sort of changing genomic landscape here, I would add that I observed recurring themes here about the fact that there's changing sociology and politics also where the existing healthcare model isn't a good fit for the reality of how data is collected and the maximum utility of that information pooling across multiple organizations and legal jurisdictions. I would just say that's not unique to this area. I brought up the surveillance as an example of where CDC and FDA suddenly find themselves having to get along and where legally there's sort of separated domains there. I think you're seeing it from the Facebook world to the Google world all over that to the NSA, I guess we have to add. And so I think we need to bear in mind when we want to make sort of short-term advances that we're doing this in a very changing landscape and there's no way you're gonna plan for that huge pool future. So you must, if you want anything real, take steps that are within the present world but bear in mind that it's changing fast and I think on a couple year time span, the world will look very different. Mark, Blackford? Yeah, I think that you highlighted the things that I pulled out as well. I guess I'd just open it up to the group to get an endorsement of the key things that were pulled out of this particular session. In particular, if there are any points that you think that were very important that we were not captured or if you have a significant disagreement with either something that was captured or how it was captured, this would be the time to raise that. That's right, everybody's got a blood sugar of 40 and so we will be revisiting these of course but I think what we'll do at this point is to declare victory for this particular key question. I wanna really thank Bob and Jim for doing an excellent job of leading this topic area. I also, in my introductory remarks, neglected to thank Duke for actually doing all the meeting logistics for GM7 and so that's Jeff Ginsburg who's the titular face of this but Teji and Rita, who are doing all the heavy lifting on this so please, if you have an opportunity, thank them for their outstanding work. We will adjourn for lunch and then we'll reconvene at 1.15. And lunch is outside, it looks like drinks and desserts are inside.