 Okay, so just a quick update and for the members of the working group who are listed here, who were present last night, if you can make sure that I captured what are discussions points. I thought we had a very good discussion last night. We tried to define some end points. So the end goal would be to the return of results from genome exome sequencing for patient care. But back to David Ledbetter's point, I should have said return of reports. So I didn't quite get that captured. So the idea is to focus on quality, not what the data is in the reports. There's different issues around that data that goes into the reports. There's other aspects that need to be focused around that. But one of the things we could do is really focus on what's the quality. So the same variant should be able to be called in all clinical laboratories. One of the organizing principles, one of the challenges that we have faced in this working group is what do we mean by these different goals? So while, as Rex pointed out, we want to be able to call every nucleotide accurately. There's also how do we think about this from a clinical delivery side and a research side. And David Ledbetter came up with the concept of should we build out different principles for the metrics on the 3,000 clinical genes and then build metrics on the remaining genes and the rest of the genome. And so we took that as a nice organizing principle. That the concept then is, is that we really need to define metrics, what we mean by each of these. And so inside the clinical delivery side, context matters. How we look at it, what we're going to do with it are variable. So for example, depending on the disease, if it's monogenic or you think you know what the gene is, that's going to be a different way of looking at it than if you're doing a whole genome sequencing and trying to come up with the derivative. But nonetheless, we need to build the metrics around that. So we have taken these groups down into some best practices. So we felt that in the wet lab side, laboratories are in need of guidelines for operating the platforms. And really the solution is, is CAPCLEA and other agencies are going to come up with rules around that. So we shouldn't spend a lot of our time thinking about that. We do need to look at and stay up to date with, but we think they'll take care of it, the expectations for covering the relevant regions. That was some of the discussions that Howard was just discussing, to make sure you're getting the coverage of what you're looking for. Do we have what we are looking for? So we need to make sure we stay with that. The other side of this is quality control metrics are measurable and not consistently defined. So one of the things that this group could do, which we will plan to do, is to work on definitions and metrics. What do we mean when we say specificity? What do we mean when we say sensitivity? What do we mean by accuracy and calling variance? And some of those issues really need to be resolved. As Deborah mentioned, she's going to send me the information of the checks that CAPCLEA is working on. We'll integrate that into our discussion points. And the goal here then is to define the metrics that will remove the need for a second method of follow-up. One of the things we need to get away from is that if we're going to do whole genome sequencing and then go back and have to validate what we find 23,000 times, that's a bit of a challenge. So the key is, what does it mean when we say we don't have to do that? Now to that point, the laboratory directors are going to make their own decisions. They're going to decide on that. So what we want to do is drive towards being able to provide information that helps them get to that decision point. Formats need to be defined for the variance. What do we mean by looking at that? And the other aspect of this is really moving forward of having standards. So what are the standards that could be used to look at this? So one of the challenges we have in looking at different platforms is it's different DNAs. There's different analyses. There's different cohorts that are being looked at. And one of the challenges that we come up with as a community is that we then spend a lot of our own time and NIH's money looking at validating these different platforms. So would it be great to set up a gold standard of samples? So for example, we know that CAP is going to have some DNA standards that are going to have to be looked at. So we should probably think about working with those different standards and coming up with some of the standards that could be used across these different end points. So for example, if all the manufacturers use these same samples, it would then create a way to make comparators against these different platforms that come out. So what types of samples should these be? One aspect that was discussed is perhaps a seven that we use for HAPMAP. Again, these are just potential solutions that the committee needs to work on. We need to think about the diversity of these different samples. Are there different complexities with the samples that are in there? So if platform X can deal with the MHC better than platform Y, we would like to know that. So the idea then is to create these gold standards. So the action items that are going to be put into play, Heidi is going to link us to the CDC group and determine which of the two samples are going to be used by them for their QC. And can these be part of the DNAs that are used for CAPCLEA? Laura is going to send me the links into that. So we'll link up with the CAPCLEA group. And the goal then is to write a white paper about the samples and the metrics that can be used to compare the sequencing platforms. We also spend a fair bit of time, most of the time, on the analytical best practices. So the key issue is to define standards and tools for analyzing the genome. And these become issues, again, about are we talking about the same thing? So what are the standards needed to assess quality? Duplication rates, minimum coverage, quality standards. And so, for example, we had a short discussion around we tend to talk about coverage. How deep of coverage do we need? But that coverage is going to be dependent upon what's the platform that's being used. It may be for some of these longer reads, you don't need as much or you need more depending on what the accuracy is. So we need to come up with definitions around those pieces. These are needed for measuring false positives and false negatives. Again, this is the sensitivity specificity issues. Standards need to be platform independent. Now there's going to always be some platform dependencies that are going to be looked at. But in terms of making categorization between them, there are some things that shouldn't change. And those are the issues that we need to pull out. Other issues are need for software and standards and tools that feed into the diagnostic market. Data analysis tools that are developing very quickly also. It's tough to define the appropriate parameters for analysis. Software and database is the lock rather than dynamically change. This is what I talked about before. And the idea then is to come up with datasets that can be used to compare these new tools. So ways to benchmark the performances. So action items that were put forward were to collect a list of the datasets and a bit about them that could be used for making comparisons. So for the data to be part of this program, they'd have to have the correct consent for distribution to other sites as well as information about these. So for example, Debbie has a bunch of trios. Those could be looked at potentially. Les has a bunch of, has 10 families or 10 patients that he's looked at. So the idea is to first collect all these different DNAs that could be used as part of the analysis bake off group, if you will. What would these DNA datasets look like? We then need to take a look at, select these datasets to distribute, and then define the benchmarks and then put in place some ways to go in and analyze these. So that's the concept around this. So I think we've put in place some actionable items around this. One of the places we didn't get to last night, but I left it on the list. So if the committee doesn't agree with this, we can take it off. But I think the other point was to look at central repository for clinical comparisons. Heidi really discussed that in some detail. I think there still is that need. But I think if we can kind of work through the wet lab pieces, the analysis pieces, and then some of the central repository pieces, I think we've now got a much narrower scope of action items and goals to be achieved by this committee. Jeff? Two things that we actually didn't cover, but that occurred to me as you're doing your summary. The first is that it's the idea of the 3,000 genes for the initial set of standards is a good one. I just think that the subsequent goal has to go beyond the other 20,000 genes because there are almost certainly important variants that are not in the genes and that we're going to need to be able to have quality metrics for. And then the second point is there's an appropriate focus on the human genome in this discussion. If we can get this right here, then that provides a path for people who want to be able to study other genomes, for example, plants that might present different challenges to the sequencing technology. This is the NHG. Other comments, questions, or? Mark, you? No. Okay. Okay. Brad? On the collection of data and cell lines and things, TCGA is doing three paired cell lines, germline and tumor-derived, to very deep sequencing, like 150X. And those data will all be posted this summer and even some variants of the data set where they tweak it down computationally to mimic, you know, poor allele fractions that you might see in a tumor and things, but, you know, those will be available soon. And those are commercial cell lines. Okay. Yeah. So I don't know whether you want to, it's a bit of a wordsmithing, so you can tell me whether to stop. You had a slide where you were talking about the CAP and CLIA and CDC samples. And I think what we want and Heidi can correct me too, but hearing the CAP presentation of what's going to be in their checklist and what they were going to do, they're looking at what the proficiency samples are going to be. So what they want is not necessarily the same samples as what you're looking at and what CDC is looking at or what NIST is looking at, because they need to check whether labs are getting the results. So if these samples are out there and the results are all out there, they're saying, well, how are we going to check that? So they are looking at a little bit different proficiency samples than the ones we were talking about. So just want to clarify that because there was this intersection on that slide. So what I was trying to say, and maybe we can't do this, so correct me if we can't, is that it would be ideal, at least in a simple construct, if we can use the same samples all the way through. So if we can use them across the board, some things we can, some things we can't. So I was trying to capture of where we can use the same samples all the way through, there would be an advantage to that. But obviously there's different needs around the different samples. So that's what I was trying to capture. So the same samples might be good for platform comparisons, which is what FDA is looking at, may not be good for proficiency testing, but those same samples might be good for, you know, they might be good for your studies and also the same ones that CDC is looking at or NIST is looking at so that you have these cross-comparison. Okay. And I can add that I was just talking to Deborah before she left and they do intend to use the similar samples. You know, I think more than one is necessary so that you know you can do proficiency. But I don't think they're going to go way divergent, you know, to something that doesn't have data on it. No, they just don't want exactly those. Mary? We're all just saying, what's BIC? I have no idea. It's a breast cancer database. Yeah, right. So, so why, why was that listed, I guess I don't understand. So, the BIC is a location-specific database that's been maintained since about 1994, 95, and has had data deposited from myriad up until about 2007 and has had international and national data deposited from multiple different sites and is probably the biggest location gene-specific database and is used by a lot of people, it has a very active steering committee who's been reclassifying mutations and working on uncertain variants. So it's an example of one of those locus-specific databases that's wonderful. Then I guess something like FarmGKB also annotates and curates a lot of variants in a relatively small group of genes that are well-curated, but that still doesn't get at the idea, were you talking about the idea of having some overarching system? And in that context, is it totally out of the question to expand DBSNP? No, no, I don't think it's out of the question. I think it's a question of how we structure curation and do you deposit data into one system as a primary system and people curate within it, which is certainly what I would lean towards so that everybody structures their data in the same way because it's going into the same system at the outset. But there's also the practical recognition that highly effective curation groups have operated around systems like BIC and we want to retain that functionality. So I think we're all still open to thinking the best strategy to capture effective and high-quality curation yet ensure that we all have efficient ways with interacting with that data through the pipelines that we all want to use and create in our own environments. And so I don't necessarily think the two are mutually exclusive, but we have to ensure consistency in terms of structure and standards that are being used. Do you see this as completely different or directly linked to how data would be curated and displayed in EMRs? I think the EMR environment is an extension in terms of once the data comes out of laboratories that are interacting with systems, whether it's ClinVar, et cetera, would help to ensure that their reports containing variants are using the most accurate data. And I think in conversations that we've all had about wanting to improve our understanding of variants of unknown significance, allowing backward flow of EHR data into these curating variant systems so that we can improve our understanding of variants is the reverse flow that I think we want to think about also. I mean, I thought I would just give a quick anecdote. So our group does pharmacogenomic testing, and we're at least putting all the results in with the real gene name first. But all of the HLA testing that's been done by our HIV group for a baccalaureate in our same institution go into the medical record without any gene name attached to the result. And then, you know, this is just one relatively sophisticated institution. So it seems like we have got to come up with some conventions about some basics about how these results are going into the medical record. So I agree. And you know, I think we kind of settled on last night, correct me if I'm wrong, is that we were going to, this was more, the working group was more going to be talking about setting the metrics and the standards around this as opposed to how this was necessarily going to be reported. That needs to be taken care of. That needs to be addressed. But I think we have to start with some basics. What do we mean by these different definitions? And what does it look like when we have those definitions? And then I think there will be a natural flow from that out towards, what does that mean when you move towards reporting? Does that make sense? So the reporting is kind of downstream from this. Well, but we have an RFA being funded to encourage us to put genomic results in the medical record, and nobody is even putting them in the medical record necessarily attached to a gene name. So that doesn't bode well for future use of the data and making things harmonized in the future. So there is an HL7 working group that's been, you know, that we worked with a number of years ago to define standards for reporting genetic variants. And that's been out for, I don't know, five years, that standard. Now it needs to be updated with more genomic location information. And as the sort of probably primary way to locate a variant across, you know, a genome build. And so one of the things we put in the grant is working with this Stan Huff and his group to define, you know, improve those standards so that this problem won't happen. But it needs to be done. It's an important problem. I was just going to add on that, that the eMERGE group is working on this, and Mark's part of that, and we're working with the Utah folks as well. I mean, I think the important takeaway here is that standards do exist. They may need to be modified, but any sort of approach that isn't taking advantage of standards is doomed to be a one-off and will not persist. So I think that's a takeaway for all of these projects, that you have to use the standards that are out there. Otherwise, it's just we're not going to get anywhere. I think we'll move on. We can debate this a long time, obviously. So I guess Jeff is next. Right. Right. Right. Right. I was just actually trying to get his attention.