 So, I think we're ready for, let's see who is number 17. Clinical implication. Clinical implication. So, that's David Adams, and you just need to be hooked up here, and you should be ready to go. But, yeah, I think to the degree that we maybe refine this table a bit in the paper, that would be very helpful. In monosod, just to take a step back and think about the first things that you do when you find a novel gene. So, is the gene expressed in the tissue that's relative to the disease, those types of things? And just seem like a good first step, if it's actually expressed, and that helps. Increases plausibility. Absolutely. Sure. So, we've talked extensively about all the things that we don't know and that are hard to know, so I'm going to talk about how to use all that glorious uncertainty in the clinic. This is the group of people in our working group, and as with some of the previous speakers, the stuff that we wrote down in the outline was a group effort, but this is largely mine, so I invite clarification and corrections if needed. I should mention that Ewan offered to change his name to David, but in retrospect, I don't think he was serious. So I'm going to talk about five brief sections. I'd like to talk a little bit about a specific context that we've touched on today but haven't talked about explicitly, which is the clinical context, actually the practice of using these in the clinic. I'd like to talk about communicating variant information and returning results and a composite clinical example and then talking about how a report might be formatted for clinical use. So to many clinicians, next generation sequencing might mean several things. It might mean a way to simultaneously test many known candidate genes quickly and inexpensively, and this is formalized, of course, in gene panels which are available for purchase. It may be considered to be something that's used by researchers to find genetic causes of disease that are rare or common, but really this devolves into Sanger sequencing eventually, so if the field decides that a variant is important, then eventually it's going to be offered as a clinical test and it's not really going to come up in this sphere. So what I'm going to focus on is the clinical test that's used to look for unspecified or novel diagnoses in single individuals and families, and unfortunately, given our conversations this morning, most of these will be similar but not identical to known diseases and other known families, and again, they'll usually be single individuals or small. Many of them will have neurologic phenotypes and things like developmental delay and cognitive impairment, which are somewhat nonspecific. So we all know that there's a wide spectrum of clinical uses for genetic testing, and I would say that they cover a spectrum of requirement for certainty and the use of clinical judgment. So, for instance, you want to be very certain for preamplification testing or the recommendation for prophylactic mastectomy. However, you might use a little bit more judgment if you've got a low probability variant but you're considering a life-saving therapy, and in some cases, the diagnosis itself may be therapeutic, especially, for instance, when parents feel guilty about disease causation. So to be able to use this clinical judgment, a clinician needs to have enough information to determine what the strength of the evidence is. We also, our group talked about the division between reporting DNA sequence variants, and actually there's some good standards that have been published for reporting variants. Clearly it's still a difficult issue, but there are standards available for saying how you should go about this. There are fewer standards available for talking about gene implication. So in some ways this is a particular feature of next generation sequencing because it's, for all but a few genes, it's non-hypothesis-driven, you're testing things where you don't know what to do with the results of the test, which is in general not favorable for clinical testing, but that's what you do. And this may benefit from a scoring rubric to categorize levels of evidence for gene implication. Obviously biological implication is easy to hypothesize, but it's also, as we've said over and over, easy to overstate the evidence as well. I'm not going to go through this in detail. David Dimick mentioned that they, at their institution, they had a scoring rubric for gene causation, which might be useful to put into reports. A few general principles. Next generation sequence generally arrives in the clinical setting as a report. So this is generally how it's communicated from the testing site to the clinical setting. And of course some results should be immediately apparent and highlighted in such a report. So things that are clinically actionable, variants with severe health consequences, and actually there's an American College of Medical Genetics set of recommendations that's pending about some specific genes that should be looked at that are in this category. But I would say that the final responsibility for interpreting and returning results really resides with the ordering clinician. And so when a report includes a phrase like variant of unknown significance, basically that's transferring some of the burden of interpretation to the physician, and they might decide that that's enough for clinical action or not. For instance, if it's the second variant in a gene that they're highly suspicious of. The use of consultants, of course, may be necessary and that might include the testing laboratory. But I think that we need to think about empowering the clinician to be able to weigh the evidence for data use in this manner. Of course the level of expertise for any given variant or gene will vary among clinicians. It may be a gene they're familiar with. It may not be. And the same is true for their ability to analyze raw next generation sequence data. And by raw, I mean even a variant list. So the central issue here is how is uncertainty conveyed from the testing site to the clinician and from the clinician to the patient. So we want to be able to provide enough information to the clinician to allow this clinical decision making, but we don't want to bury important information in a mass of data. And we can't assume an excessively high level of analytic expertise of the clinician. So this is an example of a SNP array report. And they found a 15 megabyte region of contiguous homozygosity and they recommended that the pediatrician type this into the UCSC genome browser and see if they could find something interesting. And I would recommend that this is maybe a paucity of interpretation. So reports should, I feel that reports should err on the side of providing more information rather than oversimplification. And for the purposes of this group I think that it would be good if we could think about any opportunities to standardize the language we use to communicate the uncertainty and analytic techniques we're going to use. So next I'm going to give a little clinical vignette. It's a pastiche of a couple of different cases that we've had, but the reason why I bring it up is I wanted to use it to illustrate how clinical information is passing through the clinic as it's currently being used, clinical exome information. So imagine a family with a child with a complex, severe medical syndrome that's really destroyed the family's life. Conventional testing does not provide a diagnosis and a clinical exome sequence is obtained commercially. So the lab director carefully analyzes the data. It doesn't find any known genes that would explain the phenotype and some secondary variants are reported back. So six months pass and the patient sent to a consultant who has just read about a gene that looks very similar to the family. Five families are reported with similar signs and symptoms, a knockdown and a zebrafish showed a phenotype that was similar to the affected families and we just talked about the complicated nature of that data. And the consultant wants to see if the exome sequence detected any variants in this gene. The gene isn't commented in in the report which is not surprising since it wasn't a known disease associated gene when the two page report was assembled. The testing lab is hesitant to return a full variant list because it includes variants that have not been clear certified and are not may vary in quality but eventually the consultant gets a copy of the report and she notices that there are no coding variants in the gene she was interested in. However, there's also no indication of how well the gene was covered by the exome sequencing. So she turns to a research colleague who agrees to sequence the gene. Doesn't find any variants in the gene and he wants to look at the exome sequence and he searches the variant list and finds some variants in a different gene in the same pathway. The new gene is not known to cause human disease but based on well-known cell biology, the mutations in the gene that he has found are predicted to alter cell physiology in the same way as mutations in the known gene. And of course at this point, the family calls their primary clinician to tell her that they are 10 weeks pregnant and they would like to know if the follow-up work on the exome sequence detected anything they could use to test the current pregnancy. So all of the parts of these, I would say, are bread and butter or medical genomics at this point. I mean, all this stuff happens pretty regularly. So there's lots of things we could talk about with that vignette. I'm actually more impressed with the specific ones that we had before for that purpose. So I'd like to mainly talk about communicating exome results. So the current standard in the labs that we surveyed a couple of months ago and the reports that we've gotten back when families bring in exomes that are done before they come to us is that they get only results that are deemed to be important by the testing laboratory on a couple pieces of paper. You could imagine returning a larger prioritized list. You could return selected categories of results. Or you could return all of the results in an annotated form with a summary of the important findings. And I like to call this the radiology model where you might get a DVD back. This would of course require, once again, standardized annotation. And you could run the risk of overwhelming an inexpert interpreter if the summary was inadequate, but this does give a lot back as far as data portability. And many patients have come back to us and argued that even though the sequence they obtained was done under the auspices of a research study, it was done on them and they should own the data ultimately. And if they actually paid for it at a pocket, which has happened too, then they may have a good argument. So what's in an optimal report for clinical use? Well, quantitative information, if possible, about pathogenesis, and we've talked about that quite a bit today, how difficult that is to do, but also for disease association with a gene that you're presenting that's not a known gene. Information about how the analysis was done, analysis assumptions, information about control populations used in the analysis, because of incorrect control populations are used. Of course, you could think something is rare when it's not rare, it's just not present in the control population. And then limitations of the analytical methods. There are some general ones, which I think are usually included, but what might also be useful, specific information about what was covered and not covered in that exome sequencing instance. And all of these data need standardization if they're gonna be readily interpretable and comparable in a clinical setting. So we looked at six different testing sites as part of a talk we were preparing at one point, and the issue that I just wanna point out here is that there's a lot of discrepancy between what different sites are doing, whether they require family DNA, whether they do paternal exomes, whether they return secondary variants, whether they return variants to kids. So the field is evolving quickly, and perhaps that's a reason why what we're doing here has some good potential. This is a graph I prepared a while ago, along the left-hand side is a number of variants and log scale along the bottom are variant analyses. So this is a filtering pipeline, and I originally used this to show that the red cases and the black cases differ by the use of family data, so you get fewer variants if you use family data. But the point I wanna make is that there are assumptions made at each one of these filtration steps, and those assumptions are generally not put into a report that is sent to a family currently in a clinical setting. I think that we, the how exome studies and next generation studies fail is also not communicated very well. So for instance, if you send off Sanger sequencing for an entire gene, and you get a result back that says that they didn't find anything, it's a reasonable assumption to make that everything was pretty well covered that it was high quality sequence and that they excluded the presence of variants in that gene. Of course, we all know of exceptions, but in general, that's true. And if you're the analyst, you can say on the top, well, that's good quality sequence, I can make good calls. By the time you get down to a baseline that looks like the bottom, you might say I just need to redo that data. It's not good, and that would be a standard part of doing a clear testing using Sanger. But for next gen sequencing, you get a report back and you have no idea what was covered and what wasn't covered. You don't get a profile of quality over genes that may have been of interest to you. And in fact, you tend to get reports back that look more like this, that have single summary statistics that don't tell you again anything about a specific gene. And of course, as we know, coverage can be useful, but it also can be misleading in cases of compressions and misalignment issues. So we had a number of questions which I'll finish up with. One is that basically are there different thresholds for publication and clinical use of data? For publication, a high evidence cutoff may inhibit dissemination of information. We've talked about that quite a bit. An inadequate excessive statement of evidence may lead to unintentional clinical use. And we see that with the HDMD that things get published and acted on with inadequate evidence. For clinical use, a high cutoff doesn't allow for clinical judgment. And specifically if the lab director says I'm just going to put variants in this that I'm very, very sure about, then you don't allow the clinician to make a judgment about other variants that might be significant if the entire clinical story was known. And I think we're all familiar with the fact that in many cases when tests are required, even exome tests, the amount of phenotype information that the testing lab gets is not sufficient, not as much as that you want to make even for the lab director to understand the full scope of the phenotype. And then a low cutoff may of course lead to medical error or patient harm if it's not flagged correctly. Another thought that we had was is that common disease will be particularly challenging. I think that this is true for many reasons. In the clinical setting I would just say that with many stakeholders often you end up with a large body of self-contradictory literature and the MTHFR C677C2T mutation is a good example of that. So as instructed, I tried to trim some things and finish up nice and early. So as far as questions, they include how much next generation analytic detail should be included in a clinical report? Should we make a recommendation about that? Can analytic results and procedures be reported in a standardized manner? What's the best way to report clinical next gen study data? Are there different criteria for publication and clinical use? Are there different criteria for different clinical settings? Different clinical situations. Is the next generation study a one time measurement or a reusable resource for the patient that they can come back to over time? Does it expire? And at the end of the day, should a gene that is not well known to be associated with the presenting phenotype ever be reported or flagged in a study? Or is that just an inappropriate place for it? Thanks. Thanks. And Mike, if you could reset our timer then. So lots of good questions that were asked. David Goldstein and then Heidi and then Stylianos and Les. So I really like the idea that there would be a strong argument made that the full sequence data in some sense should be part of the report when you order a sequence. But that does then leave you with the question, who's gonna be charged with setting up the infrastructure to be able to make use of it? So one would like to be able to interrogate a gene of interest and say, was it covered or not? So the clinical geneticist is not gonna be extracting that information from the BAM files. So if this is the direction, then what is the idea about the infrastructure for being able to actually make use of these data over time? Well, I'm glad that you asked that, cause I like this idea. I think that again, the radiology model, there are the images there and there's a reader there. And so you can put some infrastructure there. Now are you gonna be able to realign the data? No, but with a fairly minimal amount of software, you would have the ability to go and look at individual variants, to search through them for a gene of interest, maybe to look at some annotations about whether they're homozygous quality annotations. And you could also have the BAM file on there that somebody that was expert, if they went to a tertiary care center, could pull off and make into FASTQ files again. But you could, as far as the DVD went, you could at least use something like the genome viewer to go and look at specific alignments to see whether there was a misalignment and an error if you got that expertise. Just so I can clarify that, cause I also like this, but the suggestion is it's actually that the point of care would have at least a basic infrastructure to be able to do those things. That's what you're suggesting, as the radiology model, is that right? Well, so when I get a radiology study on a CD, it's got the software on it. So I don't need anything else except for that DVD. Now clearly there are levels upon levels of expertise that are needed and some people are gonna be able to make more use of that than others, but that's true for the radiology study too. If I get a complex MRI and send it off to who knows where, they're just gonna read the study and they're not gonna reanalyze it. If they have their own pediatric neuroradiologists, they can go in and do a more in-depth reanalysis of the data. So just some general comments. So there's a work group at ACMG that I'm chairing to write and address almost all of these and that guideline is actually just finished yesterday to be distributed to various stages of the process, getting it through ACMG. And so I guess I could comment on what our opinion is in this document on a few of these if it would be useful. And then there's some of this we haven't fully addressed and I have my own personal opinions, but one of the things that we've stated, for example, is if you have a patient with a phenotype that is a clearly defined phenotype for which there's clinical testing available and that report has to contain detailed coverage for the genes that are relevant to that patient's phenotype. Now, obviously, if you have a phenotype that doesn't have a clear test out there, then that's difficult to do. So there's both the reporting of general details about how much of the genome or exome you've covered and at what depth, but then getting into those specifics when you've got a patient with Nuenin syndrome and there are 10 genes known to cause Nuenin syndrome, right? So can I interrupt you and ask a question about that? Because let's say that you do an exome sequence and you don't get mech P2 at all. Are you obligated to go back and do Sanger sequencing or something else and figure that out through a separate technique if needed? So you are obligated to do whatever it is your test specifies. So if your methodology says we sequence the genome and we guarantee that we will achieve an average coverage of X and complete at least X percent of that genome or exome, then if you have met that technical standard that you have defined in your SOP and validation studies, then it doesn't matter that you missed an entire gene, you've hit your specified threshold. Now, obviously, that's completely suboptimal for the utility of that test and that patient with that RET syndrome phenotype, which is why we've said you need to go in and actually specify what genes you've covered and to what extent for that specific phenotype. I will say there are, is at least one lab I heard of, I think it's UCLA is doing this, where they are putting on their website their average coverage on a per gene basis. So a physician could actually go in and say, well, my patient's got RET syndrome, I could go order the RET test for MECP2, but there could be something else going on, the test is only X-sensitive, so maybe I'll start with an exome. But let me go check and see what the coverage is for the three genes or one gene that I know about for that phenotype, so you could at least have some inkling going into the game what you're likely to hit on that particular gene, because there is some reproducibility there with respect to each of the genes. And at the end of the day, the question is what do we recommend for labs in terms of baseline requirements versus what are the things that simply let the lab differentiate itself from other labs? And so we actually talked about that specific thing, should we require labs to post on their websites what they cover, and we ended up deciding it wasn't a requirement, but I said that's a terrific idea, and gee, I could probably get more business if I did good things like that. So some things you just have to leave to labs to differentiate their services and other things you have to say, this is a minimal standard, I think trying to decide between those two things is sometimes challenging, but things that we need to do. So we also had an extensive conversation about whether to return results from unknown genes. And there is a group that, a clinical group that says in clinical use, this should really be the so-called clinical exome, which is all those genes for which there is an associated phenotype that exclude the non-clinical exome from analysis, but if someone would like their genome that was negative, sent to a research lab to pursue that question, and they can, there's the other body of individuals who disagree with that and say, you know, we can figure some things out here, we should be allowed to. And what the group that won was, yes, we should allow labs to delve into this territory, it is tricky. And that's why I was asking David that question yesterday about debating whether to return that result, which is clearly inconclusive because there wasn't enough evidence to implicate that gene in the patient's phenotype, yet maybe there's a higher likelihood that result will be pursued in subsequent assessments if it was in fact returned, albeit with the uncertainty conveyed. So right now the guidelines are stating that that is okay in a clinical report to return, but you have to make some plausible arguments for why that is a potential cause. Now, that gets back to the whole point of all of us gathering here is what makes a plausible argument, and we were not able to even get into specifying that the answer to that. So that's just a few comments I can comment on other things that people were interested. We had still on us next, but did anyone want to follow up specifically on Heidi's comments? So Gonzalo and Ben, and then we'll go back to Ustalianos. So on the last of your comments, Heidi, I think that actually the last one is actually probably a good topic for a research study. So I think it's not so clear what, I mean, it seems to me that you can't not return it ever. If people request, then it's a test that you did on them, and I think people have a right to know, and probably in medicine people return all sorts of results that are ambiguous and cause worry, and no one is really sure what they mean all the time. But in general, is it a good thing to do? Will it sometimes help diagnose things, or will it just cause anxiety? That's probably a testable thing. You could say, if we use some standard to return those, how often would it lead to a new cause, and how often would it not, right? So I think there's probably not the literature out there to tell. With your earlier point about analytic detail and coverage about genes, in the context of ESP, for example, we tried to report regions that have high coverage, and so where we thought we would be able to call perhaps, and for example, in the context of 1000 genomes, we actually went one step further, and in addition to talking about coverage, we actually had this, we'd find a mask for the genome where we thought we could analyze it with the highest confidence, and actually we had two levels for that, and on the strictest level, we only said 70% of the genome we have the highest confidence. The other bits we think will have a higher rate of errors, and actually coming up with a way to define what the current state of technology is, what you can do really well, and what's the regions for improvement, over time that might help you decide is it a one-time resource, or is it time to redo it? For this patient, I'm interested in these genes, and actually I think the analysis of these genes now is much better, because now that I have longer reads or better quality mapping or something, the data there is clearly more definite or something. Yeah, I think what I was gonna say was basically almost exactly what Gonzales said at the end, which is we found it useful rather than as much as I love coverage, as much as anybody, that one could phrase it in terms of power to discover a variant if it were to exist given the error model and the data that you use to generate the call. That gives you a profile of how well the experiment has worked, and then you can go back later and say this experiment worked well for this part of the gene, or that I was interested in, but not so well for this part of the gene, maybe at some point, as Gonzales points out, that activates future experiments, but just emitting 30X coverage probably kills most of the genome pretty well, but maybe not all of it, and it certainly isn't granular enough that you want more detail than that, and I think quantifying it in power to discover where it to exist makes sense. Still honest, thanks for being so patient, sir. I'm wondering how important is the informed consent in all this flow of information and the disclosure of incidental findings or findings, or findings on actionable variants? Actually, is there a list of actionable variants? Isn't that what we're trying to generalize? So, in other words, other studies of how the consumer reacts to the different options of uninformed consent. Informed consent, very open and liberal, and uninformed consent, that's very restrictive to a few genes, and how the consumer could change his mind if there's an indebtedness to it, and how the consumer could change his mind if there's an in-depth explanation to it. Well, a couple of things, the breadth of consents that are being used is pretty wide right now for the clinical exomes. Some of them are very, very long, six pages. You have to sit down, probably take a couple hours to go through them, others are fairly short. For my own personal experience doing consents with exomes, I'm sure other people have such experience too. It's like Huntington's testing. It's really dependent on the family, highly variable, some families, don't tell me I don't want all that detail, just do your work, all the way to the other end where people are still asking you questions three hours later about this little, the piece, the engineers in particular. David, did you want to comment on that in the wheeler? Yeah, I did. So, I mean, I think there are two things I want to answer, Ben. So, for our clinical lab, we report back what areas of coding regions of genes we, so we do whole genome, but we report back in our report. So, we require the physicians to provide a list of genes that they think are relevant to the phenotype. So, we don't think that should be a lab responsibility. We think that should be a clinician responsibility. We actually return back what exomes of those genes were not covered to a point in which we think we could reliably detect a heterozygote. So, that's the way we define it. In terms of the ethics and consent, so actually our consent form is a single page, but we take a lot of time getting to the point where the parents, in the case of kids or the adults can fill it out. We have a, I think probably one of the most liberal policies to data return in our lab. So, we will return secondary results that are adult onset that are untreatable to parents of a child. We find that there's a huge range of what parents want and it's not predictable when they walk in the room what they're actually gonna walk out signing. We have a huge degree of concordance typically between parents, but there's no concordance really across different families. So, it really seems to reflect families. The vast majority at more than 95% of people we surveyed, we've surveyed I think like almost 1,000 healthcare providers now as well in separate surveys. More than 95% believe that the secondary results should be available and more than half actually want untreatable and or adult onset results if they had their child sequenced. And we see very similar thing with parents that more than half the lactate adult onset results returned. Lest you had a comment? Yeah, so what I originally pushed the button for was the, you mentioned the 2007 ACMG mutation categories and just an FYI is that those are, of course those were of course drafted probably between 2005 and 2006 when we all should have been aware of this but apparently weren't that the presumption of known pathogenic variants had a lot to do with reality and now we know that that's not true and those definitions accept that presumption and I think it's now well appreciated that that's not a really great presumption and so my understanding from Mike is that the college is now going to dive back into that and revisit those. There's also some issues that those categories are not mutually exclusive of each other so they're a little bit problematic so I wouldn't just jump on those as truth. Yes, there is a group in the college working on a list of secondary variants that the college feels should be returned in essentially all genomes or exomes done for clinical purposes independent of the indication for the original test. It's a, how long have we been working on that Heidi? About a year and a quarter? Yeah, it's really hard to do and there are some things that you can jump on pretty quickly that you can get to agreement fast and then there's a lot of things on the edge that you can argue about until the cows come home so we're taking a very narrow approach and setting that threshold high enough that we think very few people will disagree with the list but there will be a list coming. It's always, it's been three to four months away for about nine months now so it should be coming. What's the order of magnitude length of that list? Oh. Is it 10, 100, 1,000? It's less than 100, much less than 100. I think the- Is this a list of genes or a list of variants? It is not a list of variants. It is a hybrid list because you have to be very careful about whether you talk, for some genes it matters a lot whether you're talking about genes or the phenotypes that are caused by mutations in those genes because those don't map one to one as you well know and so different disorders are being handled differently in that respect because it's a really important question and then there's which variants in those genes are included in the recommendation that they should be returned and some of them we know what kinds of variants there are and we've talked about a lot of those issues but there's a lot to it. I think it might be worth just underlining it's kind of been made by two or three people together but the difference between the discovery study where we'll start by taking raw reads and mapping them to a reference and calling a variant file and working from a variant file and the clinical environment where that is not the appropriate standard and the appropriate standard is to know what the coverage is for every base and at every position you're interested in what the quality score is for that and so you need to know what you're missing and what you're not and so pooling genotypes for known locations is one of the first things that we would do with any new genome which is a little different from the discovery based approaches. Can I ask why we don't do that for a race CGH? Yeah, why don't we pool every single? We don't have those that, those data are not available to clinicians. So if I'm looking at something for LOH I have no idea if that probe actually worked or not. I think that probably that technology to an extent was the first high throughput technology that was in the clinical domain and so I'm not sure there were any, surprised everyone in a sense that it was there. So I'm not sure, I think maybe now we're going back to think what are the processes that need to be in place to deal with high throughput tools, high throughput technology in the clinical domain. And there was a number of questions like that. I've asked the same question when people talk about secondary or incidental findings. The array CGH people don't seem to be the slightest bit concerned about that question and the sequencers are obsessed with it. I don't understand how that works out. I was just going to say that for an oligo array at least you're doing multiple tests in the same area. You can look at fluorescent intensities maybe a hundred or a thousand or across the area you're looking at and there's a smaller amount of information per base potentially I guess for this. So maybe that's reassuring, but I don't know. Just to address that, my feeling is it's a simpler environment because the thresholds are set by size primarily and not necessarily disease. And so it's easier to come to a standard sort of feeling about okay, we'll return anything that is larger than X or has a certain number of probes contributing to the result. And it's a much simpler thing than four million variants being called in a genome and how do we figure out which ones to convey back. And so I think it's just a different question that's been addressed. But the other thing in talking with this issue of distributing the raw data from a clinical test for additional analysis, when we brought that up as the sequencing group, the cytogenetics group was very much against the distribution of the raw data set and their arguments were we each use different thresholds and different QC and different this and if somebody took our data set and put it into their algorithm, they get something different and then there'd be the liability of did you not call this and then of course those things are true also in the molecular sense that we use different aligners and different variant callers and different thresholds and some might be a lower but say or confirm everything, some are higher and they don't say or confirm and so it is tricky to sort of say what's appropriate in terms of just saying yeah you can have your data and is it annotated data, is it reads, is it what is it you're giving out and when we look at a genome with variant calls there's lots of loss of function variants being called that are pseudo gene misalignment, their two bases got called as independent variants when in fact it's a two base substitution and it's a silent variant and this could be throw people off if they see all these that a quick scan of whatever database says that that's disease causing so I'm fine giving really, really raw data out feeling that the only people that can make heads or tails of that people understand the limitations of the data but giving out annotated variant data sets that we have not looked at that from a quality perspective out to other people that scares me and that's where I would rather give the very raw data out than give this sort of intermediate level that doesn't meet the quality threshold. So I guess I was responding to you in this point but not quite remember exactly what I was gonna say but I think basically along the short of it was while the goals of the research and clinical enterprise may need different outputs at different times I think by and large we need an extremely overlapping and common set of tools. I mean there's obviously no reason why either one of those activities could possibly want to use anything other than the best in class algorithms for extracting the data for machines, for calling variants, X, Y, and Z. In many of our research studies we find ourselves wrestling with conditions, either individual cases or groups of patients that have similarity to existing conditions and absolutely the first thing you would wanna do is have a clear assessment of the coverage of not only every exon of genes known to be related to that field but every known mutation in those areas and I don't think these are tools which are only suitable to one or the other of these activities and the more that we can build common strategies to tackling some of these problems the better position we're in to sort of enter this reality where clinicians are constantly bringing all of us cases that we can't immediately say whether this is a diagnostic test we're running or the expectation is this is probably going to end up in research and we'll need to match up with lots of other people working with similar cases. I might note that we haven't really talked much about quality control here or quality standards other than to say there should be someone we should all adhere to the highest ones and I have the impression that there are other groups that are discussing and publishing on those so just to be sure that we're not missing anything do we all feel that that's being adequately covered elsewhere and simply saying we should adhere to some or somebody should adhere to some and they should be the highest is enough or do we need to discuss that here? I'd be just in your opinion about this but I don't feel like that there are that there's a uniform enough standard for quality that can move between centers. Yeah we came to the same, sorry. We came to the same conclusion we were writing the guidelines you know there's this question of should we recommend a minimum coverage and a minimum this and a minimum that and that the challenges each of the technologies are different and what you call as the minimum varies across platforms and approaches and they're likely to change rapidly so we made the decision that the lab had to clearly validate and you know argue for what their technology was capable of doing and we gave some examples given that a lot of labs are using similar technology so we sort of said some labs are taking this approach to just say to give some because people really wanted us to say something concrete but we just felt that that was not doable at this stage. So I just wanna say in relation to quality just wanna remind you of the very good point that Daniel made earlier that in a lot of these really functionally interesting and important regions the error rates could be much higher and you know I think that's an important message from this group that you know you can get overall a very high quality dataset but of course on the regions that are most functionally important and whatnot the error rates gonna be a lot potentially a lot higher and somehow that has to get factored in. So really not the error rate but the proportion of findings that are errors which is a little different yeah. So I think the error rate as well my understanding is that because of things like CG content and stuff. Close homologous. Yeah yeah all kinds of stuff so it's not just it's not just the prior probabilities but also the raw error rates I think are tougher in genes and things and control regions. It's very ironic and sad. So I think that's repetitive though so you'd get a big win. I mean our major failure right now is in mismapping of reads rather than making sporadic base call errors so. So I think from a clinical lab versus a research lab there are two very different ways of looking at it. In a research setting we're obsessed with false discovery rates and not trying to find something that isn't real. In a clinical lab you're very concerned about missing something that is real. And so the way you tune your software for variant calling is gonna be very different when you're trying to minimize your false discovery rate versus trying to minimize your false negative rate. You know most clinical lab directors will have the opinion of well if we just do an extra few sangory actions well it's a bit of money but okay I can live with that. So I think there is actually a difference in the world for you also clinical labs are bound by a bunch of rules that require obsessive documentation that doesn't necessarily improve quality but is required. I think one of the things that we found when we moved into this world is also the should we say the versioning is less than optimal within the research sphere in terms of actually working out exactly what database was used underlying what and being able to go back three months or six months and say we can do the exact same analysis again because we have the tools such that we actually brought copies of all of the databases in-house that we could version them. And so I think that's not quality in the sense that if you don't have version control you're doing bad science but it's a problem from a regulatory point of view if you've got to prove. So I think there is some differences in clinical versus research that are not necessarily obvious and don't necessarily affect the quality of the result but do affect the rules that you operate under. I think what I was gonna say is redundant to what's been said but the measures of quality for this data are what's an achievable standard is changing very quickly. When Daniel wrote his paper on the 1000 Genomes Data he said that actually half the stop variants and so on that we originally reported were errors. And then more recently we set out systematic to find out how well we were doing with calling insertion deletion polymorphous which would cause lots of frame shifts, lots of highly interesting variants. And in our data set as of a year ago or nine months ago it turns out that probably 90% of the frame shifts were not correct and probably half of all indels were not correct. And this is taking the consensus of the five best available methods and so on. These are hard areas in some way and it's actually good to try and systematically quantify what you can do and there is sort of a temptation to always say we can get 95% of the staff or we've got an error rate of one in a million and it's probably not there. I just wanna amplify on Gasal's point is that I do think now in this sort of how should I say error of personal genomics people just have this assumption oh it's no big deal to sequence something and they don't really think too much of the what does it mean to sequence a genome and the quality of the underlying thing and I do think that it's good having some standards for what constitutes re-sequencing a genome, what constitutes doing these things that they're useful for publication and reporting results. Yeah I was just gonna say that I think the problem is that there's really no getting around the importance of sort of knowing what you're doing because it really is true that there are situations in which you wanna maximize sensitivity and so you say it's actually okay that I've got a lot of false calls in there because whatever comes out I'm gonna look at it and if I'm interested in it I'm gonna say or confirm it before I do a darn thing and that would be a very appropriate decision to make if you were sequencing a child with an undiagnosed condition you'd say okay I'm just gonna call everything and then everything I'm interested in it I'm gonna follow up but the problem then is if you sort of go that route then people start using those kinds of standards for calling and making arguments about differences in the load of certain kinds of mutations between cases and controls of a certain kind and so that means it's very difficult for us to say look here's the standard for how you ought to sequence genomes that's really tough and what you have to do is say look you have to think really hard about exactly the way you're using the sequence data and act appropriately given how you're using it. Yeah and I think it's also this I actually do think is a little bit different between following up variants for Mendelian traits if you're trying to look at segregation through a family where first you could be very strict and see what you find and then you could loosen up your criterion. However I think when we're testing for variants and we're using a burden or aggregate type of test it's a different ball game I mean because you don't and it's a very fine line and I don't really know what that line is. So it's you could overly clean your data but then yeah your false positive rate is really low but your false negative rate is very, very high and that will kill your power and I think you know we're going to be a little bit robust to you know a certain percent of false positives there but you know we can't have our test overwhelmed by false positives so it's very hard to say how you would properly QC the data and have that balance and I think for complex traits in a way it's much more difficult because we're analyzing variant sites in aggregate you know and we certainly wouldn't you wouldn't want to do different ear durations of cleaning the data and retesting so that I would not advise. I mean I guess in the clinical setting you do have the safety net of having independent validation of those variants which you won't have in a load based association test. You might have like an overall sense of your false positive rate but not every single variant. Right, right yeah because with the Mendelian trait you could follow it up it's more work but it's definitely very doable and especially if you're working within a small linkage as a smallish linkage region it wouldn't be that many variants to follow up. I think just to answer your question directly it's clear from the discussion that probably including some of this information and some of this discussion on quality would be a useful thing I think for this document. In particular it's another nice example I think of the two worlds sort of meeting in the middle in the sense that some of the things that the next generation sequencing technologies are very good at and where the focus of a lot of discovery has been on single nucleotide variants of the places that have been these variants of tradition have been viewed as less important than the ones more likely the frame shift which are the endels that we've been traditionally not very good at in the next generation. So put that together with expansion repeats which are also beyond a certain length the challenge for next generation technologies and I think we have some significant limitations to what can be done that are not really being talked about very much. So I think this could be a useful forum to do that. Can I make a specific suggestion just to throw it out there and you guys can say this is not sensible but you know in many areas of medicine it's common to think about getting a second diagnosis right you know there's no reason why we couldn't say you know even for something like the genome you know if you're getting a diagnosis you could consider getting a second one you know there's no reason why you can guarantee that the first one is the definitive beyond that all because of all these things there's errors and technologies change and so on you know. Well David I mean think about how you work you work clinically you get a you know you get a weird clinical test the first thing you do is you repeat it. No no I mean but you know when you look at you know if you you know David Goldstein put an example of a particular family that had the child with a condition that was hard to diagnose right they went through you know many many different genetic tests actually not just two maybe like five or six or something I just browsed the blog before they got to the right one you know and so I don't I mean I think it in medicine and probably in many areas of life you know things are not as definitive as we'd like them to be you know if we had a perfect genome and we knew how to interpret everything great but you know but it's not there and probably you have to get the two clinicians to talk to each other and figure out the difference it doesn't seem such an impossible unreasonable thing to to figure out. The limitations or even systematic errors might be pretty correlated across the different you know exome based attempts at a diagnosis anyway in which case there may be less less benefit there if that's the case. So I was about to jump in and tell Gonzalo is a really bad idea and I kind of thought about it for another second so there's a false negative issue and a false positive issue and thinking about a second opinion right so for a false positive issue it seems like a Sanger confirmation would be a much simpler route to getting to answer right and then but for a false negative issue that's where a different and then I guess you could also break it you know what does it mean to get a second opinion it's the act of sequencing in a different place with different tools and all that but then there's also the analysis right so you can imagine someone who just have a different interpretation of the same data so you wouldn't necessarily have to even generate the data again you just take your data somewhere else and get it reanalyzed right. So we had we had less and then Joel I think. I was going to change topic as early. Joel did you want to speak to this? Yeah I guess just more on sort of second opinions I think it also speaks to the idea that the patient should be able to have access to the full set of data that would be would allow them to go to you know just the same way you know you could have your CAT scan and it's read by radiologists one hospital and then you take it to radiologists another hospital and they you know they'll look at it again and you know I guess even repeating tests sometimes we do you know if there's a biopsy that's indeterminate or something like that then sometimes people need to go back for that even though you prefer not to but I think in terms I think it does speak very strongly to the idea that regardless of what's reported back patients should have the option of getting all of that information at whatever level they want it. I just want to comment on your third bullet which I think is an incredibly interesting question that might warrant some input from a group like this it is also clearly a need for empirical research into this question because it's not an easy question I don't think but one of the considerations I've been struggling with is arguments with folks who will tell me things like oh we have to control how this testing is used and I have to remind these are usually non-medical folks who will say this and I have to say you have to realize that once a test is clinically licensed or available any clinician can use it for any purpose they see fit and we therefore cannot control it it's that's by design of the entire system and we'd have to change our entire system of medical training, medical licensure and control of tests if you want to actually control that and I'm not sure that that's even feasible so that third bullet though is a potential answer to that question in that what clinicians, frontline clinicians almost always tell us and when we've worked with them is they want the answer boiled down to the simplest plainest statement of what the test means and what they should do and I would suggest that a consideration is is there may be a good reason to definitely not do that with this kind of testing and that you should present unlike the paragraph slide you had Heidi which had all of the reasoning and all of the dimensions of considerations that go into interpreting these tests are how we actually think about these things and that's if you're going to get a right answer you actually do have to consider all those variables in the context of the clinical situation you're in and so sort of forcing the issue of making it clear to people in how you present the result that you actually have to understand to be able to think about all these things to use these data there's merit in that and we shouldn't oversimplify that and just give it in too simple a fashion. We do need to move on but we had Russ and Heidi so and David I'd like to hear your clarification on that one. Just to very quickly agree with that I'm aware of a pharmacogenomics study where there was supposed to be a very limited intervention in a single clinic they announced it at a staff meeting the availability of this pharmacogenomic test and they closed down the entire operation in two days because every physician who had any patient that they were even considering using this drug on had ordered the test and they had brought the lab to its knees. So that tells us a few things that physicians are very eager to use genomics if you offer it to them but be careful what you offer because they will order stuff and then we're really gonna be left with this problem. Just to comment on Russ's thought about these not simplifying the reports too much so I agree with that and I do feel that there needs to be significant evidence-based logic in the reports but if everybody read the news that came out Monday about the patient who had cancer because somebody misread the report we try and balance and have an overall result that is extremely simple and the report starts with one word positive, negative or inconclusive and then we go on to explain why and in coming up with the recommendations we made a recommendation that the report should start with a simplified statement understandable to any clinician and we gave examples of that like here for an exome sequencing test where you didn't find an answer, negative and an established or plausible cause of the reported phenotype was not identified highlighted in big bold box and then you go on to say these are the genes we looked at and this is how our algorithm worked and then the same thing on a positive the example we put in was and this was a case where there was genes identified related to the phenotype but the variance in those genes were unknown significance so how do you give an overall result like that and so we wrote variance in genes with an established role in the reported phenotype were identified so we're trying to come up with simple statements that give the essence of the report so that there's not misunderstanding for the physician that does the five second read of a report but yet also arguing you really have to go into a deeper level of evidence for what your arguments are later in the report so some sort of balance there is what I hope for so we ought to end up with David but I don't know which David, I meant David Valley but David Adams has his. I'll just very quickly. I think Les made a good point about bullet three that it's a topic that could be investigated further and I hope throughout our manuscript we will make a number of points that there's certain these questions need additional research and there's all the reason to do that to move the field forward and then I'm reminded by this last discussion by the paper now quite probably 10 years ago from Frank Giardello at our place who studied molecular testing for colon cancer and looked at the level of understanding of the various people in the chain and the physicians who ordered it were down like 25% of them knew what they were doing and the counselors did better and the patients just gives you an idea of the magnitude of the educational effort. All right, thank you. So I think we're ready to move on then to our last working group. Mark you have the cleanup slot here.