 All right. Well, thanks for inviting me. So like Stephen, I went through the process of submitting the presubmission inquiry and so forth. And so this will be a little bit of sort of my thoughts on the process, but also hopefully towards the end, really starting to explore why validation of these types of tools can be very, very complicated and how quickly things move from analytic validity to clinical validity and how that impacts the study and why I think that makes this so challenging for us at the interface with the FDA where a lot of this starts to bleed over into sort of medical practice. So that's sort of where I'll be heading with this. So our project is called NCNXIS. This is an exploratory project, again, through the N-Site Consortium. And like Stephen's group, we're doing exome sequencing. Actually, he's probably doing genomes in exome sequencing. So we're doing exome sequencing, and the goals of our study were to assess the performance of sequencing in a screening context. So really a lot of the things that the FDA wanted to know about our study were the goals of the study. So we'll know at the end of the study, hopefully, some of this information that you're asking us. And so we have a plan to enroll 200 infants and children who have already been diagnosed with conditions through newborn screening. So these are kids with PKU, cystic fibrosis, et cetera. They've been taking care of. They have a diagnosis. And the goal here is to use these participants as a way to test whether the exome sequencing could adequately predict the presence of those diseases, right? So it's like studying what would have happened if we had sequenced this without knowing whether they were affected. Could we predict that that child had PKU or this other child had cystic fibrosis? And we also have 200 unknowns, really healthy newborns. So these are babies where there's no indication of any kind that there should be a condition that would be detectable by newborn screening. And that will largely be a study population which we can begin to assess some of the specificity of the testing. And we'll get to that towards the end. So the other big part of our study was parental decision making about whether or not they should even have their child undergo exome sequencing. And their decisions about other types of information aside from the sort of traditional newborn screening information, those conditions that would be medically actionable in childhood, but also other non-medically actionable information such as carrier status for recessive conditions, the presence of a gene mutation that would predict an adult onset medically actionable condition, or presence of a gene mutation or mutations that would indicate a non-medically actionable childhood condition. So these were some additional categories of information that we wanted to study because if you envision using a genome scale test as a adjunct to newborn screening, you're going to have to tackle head-on how to assess parental preferences to learn this information or not to learn this information. So the design of the study was really largely about the parental decision making. And these were essentially numbers that could be justified based on the cost of sequencing and the amount of budget that we would have. But enough subjects to be able to answer some of these questions, probably not enough to convincingly validate this test as a newborn screening tool. So we were determined to have a significant risk and this is what it resulted in. This is a stack of printed paperwork that got sent in plus the two DVDs or CDs that got sent in and that was, and I don't have the timeline, but this set us back probably nine to 12 months to just kind of get through this whole process. And partly because we were trying to figure out not just through the pre-submission inquiry, what did we need to provide in a pre-submission, but also then, what do you need to provide in the IDE itself? So I'm going to talk a little bit about analytic validation. This was difficult for us. We didn't quite know how to respond to the questions. We're using commercial saliva collection kits. So there's some documented information about the capacity of those kits to provide a high quality DNA source. And we had done a little bit of piloting with small handful of cases to just make sure that we got high quality DNA. But the other thing was that we're doing automated DNA extraction in the UNC biospecimen core facilities. It's a facility that has done hundreds of thousands of samples for different studies. We're going to let them do the DNA extraction storage. So there was that sort of validation. What are they doing there? We're using commercial exome kits. Those have been validated by the companies. We weren't necessarily intending to do any additional validation of the exome library prep kits. Our sequencing is being done in a core facility at UNC that runs a number of high-seq machines and has been doing sequencing for a number of NIH projects, including TCGA and our other clinical sequencing exporter research project. And so we have lots of sort of general experience using that core facility, but perhaps not the same level of sort of intentional validation that might be done if you were a clinical lab and you needed to bring up an assay and make sure that you had clinical validation. And then we're using very standard bioinformatics pipelines that everybody's familiar with to generate the variant calls. So we have not independently validated anything really. This is a kit of components and we've sort of strung them together. We do specify some of the things that we would do to quality control that. But it would be very difficult, I think, for us in the context of our grant to be able to independently validate this entire process to the extent that you would if you were planning to commercially market a product. So that was part of our learning curve is sort of figuring out exactly how to respond to these types of things. And the sections and the previous talk, you saw some of these. There's a report of prior investigations. This is really the description of our pipeline, our little pilot study of exome preparation from saliva DNA from eight samples and so forth. And then there's a very significant section that's the investigational plan, which really reiterates the wet lab and bioinformatics, which now is the device, but then includes very detailed information about the rest of the procedure. How are you going to analyze the variance and what categories of results will you return? How will the parents make decision makings and so forth? And that was all part of the investigational device exemption package because you have to define the protocol. And then, of course, lots of appendices with all the detailed laboratory standard operating procedures. So that was what was contained in that stack of papers. And I think in terms of the validation of the sequencing, we really rely on a lot of prior publications from our group and others. Stephen pointed to a very nice validation paper. And there have been lots and lots of them, I'll show you a few. And so to what extent, from our interactions with the FDA, could one rely on the community's validation of all of these tools in a variety of different settings, or how much of that had to be actually validated within your own setup? Some of the questions that we had. We had mentioned in our submission the sequencing of about 600 individuals, but really didn't have extensive validation of knowns. This previous experiment was in patients with undiagnosed disorders. And so other than the patients that we had found to have a mutation, we didn't really have a control set per se that was 600 patients with known disease mutations that you then would validate that you could find with exome sequencing. And we were doing Sanger confirmation with a greater than 99% confirmation rate. But I will point out that this was using a fairly stringent threshold for variant calling, and we'll come back to that. That's an important point. So in our discussions, there were questions about what kinds of false positive results would there be, what kinds of false negative results could you anticipate, and how would you estimate the likelihood of that? So just to define the term, analytic validity measures the ability of the assay to accurately detect an analyte. So how often is the test positive when a mutation is present? That's sensitivity. And how often is the test negative when a mutation is not present? That would be specificity. And of course, analytic validity and the CDC has defined this through the ACE project also relates to the reproducibility and robustness of the assay. Does it work under varying conditions? And could you get relatively the same results if you tested the same sample twice and so forth? But it all boils down to sort of two by two table, right? The variant is either present or absent in the individual, and the test either says that it's positive or negative, and then you get your true positives, your true negatives, and you have two types of false results. One would be that you fail to detect a variant that's present, and another is that you call a variant that's actually not there. So that's your classic two by two table. It's actually kind of a three by three table because we're really dealing with two alleles, right? So we could have a homozygous alternate allele. You could have a heterozygous alternate allele, or you could have homozygous reference. And so your homozygous call would be a true positive if the patient actually had that homozygous genotype. But in a way, it's sort of a true positive. If it detects a variant in the setting of a heterozygous variant, you're just incorrect on the exact zygosity. And the same thing would be true for a heterozygous variant that's detected when in fact the patient's homozygous. Now this might lead you to the wrong conclusion about what that meant if you didn't have a secondary way to figure this out. Likewise, you would have false positives if you picked up a reference, an alternate allele, and false negatives if you failed to detect an alternate allele. So those are kind of the ways that one could think about the possible genotypes. But I'm gonna keep it simple and stay with the two by two table for the most part. So just a brief thing on variant calling. I think everyone here knows, we all understand the way that sequencing works. Effectively, there are a number of things in the sequencing process that can affect variant calling, the base quality of the actual reads, how complete the references and our reads getting mapped to the right place, the genomic architecture, you're dealing with something that has low copy repeats or some kind of a simple repeat triplets and so forth, and how much genetic variation there is. And essentially the variant calling becomes a statistical inference based on the evidence that the sequencer generates. And the algorithms are tunable, so you can adjust the sensitivity and specificity of variant calling based on how you set your algorithms. And that's a really key point. I think that's one of the things that our interactions with the FDA were really important for us to say. And this is one of the reasons why they want you to lock down the process. Because there are so many ways that tweaking one thing here, tweaking one thing there, can lead to changes in the analytic performance. So a few things that could cause a false negative. So you might have a library preparation like an exome that doesn't amplify a particular region. And that region happens to be where that variant is present, so you're gonna have a false negative because you didn't cover that region. That may be a little bit less of a trouble with a whole genome sequencing, but there still may be regions that don't get sequenced very well. A low coverage region, for example. The reference genome itself may have some incompleteness such that reads are getting mapped in different places and there may be a clinically relevant variant in a part of the genome that hasn't been finished. And so therefore your variant calling isn't gonna accurately reflect that. Or there may be even certain types of variants like triplet repeats or copy number variation that really doesn't get called very well with your particular pipeline. And so those are all the sort of possible false negatives. False positives could come from just sequencing artifacts and there's well-known predilections in different sequencers of what types of mistakes that they made. Certain platforms may do less well with small indels, for example. The genomic architecture of a particular region like whether there's a homopolymer in place might affect your variant calling in that particular spot. Whether there's a pseudo gene present might affect your variant calling. And so these are all the reasons, not all of them, but these are many of the reasons for false positives. So this is, to me, these are really sort of still a technical blind spot without the sort of widespread gold standard truth set. Now there are some moves in that direction that I'll mention, but for the average researcher who doesn't have access to multiple sequencers, sequencing the same sample, running multiple algorithms to do that comparison, each individual researcher may not really have a perfect sense of what the false positives and false negatives are in their particular platform. Now we can minimize these. So the false positives that are sequencing artifacts and so forth can be minimized by using some orthogonal method. So you have a secondary method, Sanger sequencing or another highly accurate method for determining sequence variation. You can convert a lot of the false positives to true negatives. So these were things that were picked up as a test positive on your sequencer. Your secondary analysis showed that they were not there so they can then say that they're true negatives. Now in this case, it actually matters a little bit less how many false positives the sequencer's gonna give you if you have a secondary method to adjudicate those, right? So this is an important consideration if you think about what your device is. If your device includes secondary confirmation in which you're gonna be able to eliminate some of the false positives, then that needs to be considered a device. Now there is a cost in orthogonal testing that if you are having to go and confirm 10 predicted variants for every one real one, then that's a research cost that you have to think about is that the right decision. So the other sort of possible consequence there, and this gets to what Stephen was saying is if the orthogonal method is the truth. So if Sanger sequencing is the truth about that variant, then you're gonna actually lose some true positives and the rate of this is gonna be critical, right? How many false positives you can reclassify versus how many true positives you lose. But in that case, the truth, the technical false negatives are gonna be influenced by your orthogonal test in addition to your next generation sequencing test. So it's not just what is the false negative rate of the next gen sequencer, but what's the false negative rate of confirming those findings with Sanger sequencing and how often does that affect your outcome? So that's an important consideration for these sort of secondary confirmations. So going back to the three by three table, you could really convert your ref alt calls to alt alt correct genotypes. If you're Sanger sequencing shows that it actually is homozygous and your sequencer was off or vice versa here, as well as your test result can be validated here. So you can add to the benefit of having the secondary confirmation, the ability to adjudicate whether you've got the right zygosity. I think that is an added benefit of doing Sanger confirmation, even when we think that those calls are gonna be very high quality. But, and we were talking about this a little earlier, the reality is that your test positives and test negatives really depend on the threshold you set that define what is a positive and what is a negative. And this comes at the level of your variant calling algorithm. If you set a threshold that is going to allow more calls, then you're gonna have a different ratio of these things. So just to kind of put an example. So if I set the stringent threshold such that only the really positive calls are gonna get called positive, then I'm gonna have very few false positives because I've really ratcheted up the specificity of my test. But I'm gonna have more false negatives because there's gonna be some low quality calls that are real that I lose because I've set my threshold too high. Likewise, if I have a very relaxed threshold for my variant caller, that's gonna mean fewer false negatives because those low quality calls that actually end up being real, I'm gonna see them. But I'm gonna have more false positives that then if I'm doing Sanger sequencing, I have to do a lot of extra sequencing at pretty sizable costs per variant. If you think about about $100 a variant, you could very quickly get up to the full cost of the next gen sequencing run in Sanger Confirmation alone. So this is I think an important point is sort of where do you set the threshold on variant calling. All right, so essentially this becomes a pragmatic decision. How much Sanger sequencing do you want to do and how much are you willing to tolerate false negatives? So you could empirically determine that. I think that's something that our field should do. And I think there's a role for really well-studied pipelines to really determine where those thresholds should be set based on different sequencing technologies and so forth, but that costs money. And I don't necessarily think that that's something that the average researcher should be expected to put into their grants if this validation can be done as a community. So how much should we be responsible for quantifying that before getting involved in research? Or can we understand that there's gonna be these choices that affect these parameters? And really in the end, does it depend on the research question in mind, right? And that's probably the answer is that it really depends on what you want to do with the information. So a great deal of work has been done. We saw a little bit of it this morning. There's been lots of comparisons of different sequencing platforms, sequencing the same sample, different variant calling tools, analyzing the same raw reads, multiple combinations of sequencing and variant tools. And the bottom line is that nothing is quite perfect. There's still room for improvement and we're always getting better. And that gets to Stephen's point about this incremental improvement and that you would always wanna be trying to use the best newest thing. All right, let's see. So here's a few examples. Just I just pulled a few out of the, this was an early one. This is from 2013, low concordance of multiple variant calling pipelines. Uh-oh, we're in trouble here because we're not concordant. Then you've got resources like the genome in a bottle consortium starting to get a benchmark set of SNP and indel calls, systematic comparisons of different variant calling pipelines and now even more work on extensive sequencing of seven human genomes to characterize benchmark reference material. So one outcome of this could be to think about how these types of resources could be leveraged in our environment to say, what would an acceptable validation be for the average researcher doing this type of research and do you need to sequence seven different genomes using your platform and find that you have a certain level of concordance with the gold standard calls? What would be those benchmarks? I think that's an important outcome of this conference. FDA has an effort in this on their own and sort of precision FDA sort of thinking about how you evaluate NGS assays and so maybe we'll hear more about that as well. So this is just to reiterate the genome in a bottle consortium and again sort of how much of this can be done as a community and how much of it should be done by the individual researcher when they need to submit an IDE. Okay, I'm gonna switch over a little bit and start talking about where it gets really complicated and that is clinical validity. Analytic validity is one thing. You can sequence a sample, you can call variants, you can mathematically determine how you were calling those but whether it means something is critical to the research that we do in genomic medicine. Is it a pathogenic disease causing variant or a normal polymorphism? Who decides? Is the gene truly associated with disease? Who decides? How well does the case level data that you have on your patient or your participant the phenotype information matching up with the genotypic information to provide you with an answer for that patient? Have you solved that case? And so again, you can think about the ideal two by two table, the disease now, not the variant but the disease is either present or absent in this patient and your test results gonna say that it's either positive or negative and so you could think about can you estimate a true positive rate, a true negative rate, false positive rate, false negative rate but it's a little bit more complicated because we have this sort of bugaboo in our field of uncertain results. We have variants of uncertain significance. So those fit in there somewhere between a positive and a negative result some of which might be positive because they might ultimately be determined to be pathogenic variants and some of them are gonna be negative because ultimately they'll be determined to be benign and so genetic testing is not ideal in this regard. So in variant pathogenicity assessment we review a number of different types of heterogeneous data, the prior literature, what other people have said about that variant, how frequent it is in control populations, what do the predicted algorithm say about its effect on the protein? Is it gonna truncate, is it a miss sense? If it's a miss sense, is it gonna be damaging, et cetera? Are there functional assays that actually interrogated that variant before? Has it been seen and segregated in a family or do you have your own segregation data on that variant? And these are guidelines that have been put out by the ACMG on how you should think about these different parameters in assessing a variant. And what it boils down to is the five categories of classification. There's benign, likely benign, pathogenic, likely pathogenic, and then VUS in between. And one would assume that pathogenic and benign variants should be greater than 99% certain to be pathogenic or benign. But the thresholds for what constitutes a likely pathogenic variant or a likely benign variant differ. So there's an international consortium for cancer research that thought that to call something a level four or likely pathogenic, that should be a 95% likelihood. The ACMG has said this should be a 90% likelihood. Each lab has their own rubric for deciding if I call this a likely pathogenic or a VUS. And really there's not a good generalized method that lets you quantify that anyway. And so this VUS result spans a very wide range of probability where one lab might call it a likely pathogenic and one lab might call it a VUS. And they may just disagree about that. And that's part of the practice of medicine. So let's get forward to gene disease association. How strong is it, the evidence, that a particular gene is implicated in a disease? And that's based on the genetic evidence and the functional evidence that have been accumulated. And this leads to questions like what genes should we be even including in a multi-gene panel? So if you've got a multi-gene cancer panel with 50 genes on it, are all those genes actually valid causes of human hereditary cancer syndromes? And when you're looking at a genome scale test like whole genome sequencing or whole exome sequencing, which genes should you look at? Which genes do you recognize as being disease genes? ClinGen has been working on this problem and something I've been involved with. And I won't go into too much detail other than to say that it's hard. And we're trying to define some qualitative categories in which you could say that there are some definitive gene disease associations that have just been repeatedly demonstrated. There's just no question, everyone agrees. There's gonna be some strong disease association where there's a strong set of genetic and functional evidence, multiple studies, it's been replicated, et cetera. There could be a moderate level of evidence that maybe lesser numbers of genetic evidence, fewer cases, fewer experimental studies done, and then limited evidence. No evidence reported would be genes where there's just no evidence of human disease at all, that there may be animal models or possibly candidate genes. And then some categories of conflicting evidence. So we're working through this process and hopefully every gene at some point will have its gene disease association classified as definitive, strong, moderate, limited, of course that'll take a long time. So until then we have to do the things that Steven mentioned, which is to say have your own internal rubric for what gene, what gene level evidence sort of reaches your group's threshold for reporting. And then for case level data, how well does the phenotype fit, right? The analyst needs to think about whether the phenotype makes any sense at all with the gene and the variant that you've found. If it does, then the finding may be a diagnostic finding. If it doesn't really fit with that finding, then it's a secondary finding and that's an important distinction. We have lots of questions to answer in the field like how much phenotype data do you need? Can you accurately predict with no phenotype data at all like we want to do in newborn screening? If you are going to analyze a set of genes, how do you prioritize? Which ones make the most sense for that patient? And then finally, how do you categorize your results? So I'm gonna show you a few thought ideas here and this graph seems a little bit busy but I wanna walk you through it because I think that this sort of outlines how our group thinks through it and it probably is reasonably close to what other people do. So on this axis I've got the variant pathogenicity and on this axis I have the degree of phenotypic match. So over here would be a condition, the patient's phenotype matches perfectly to the gene in which the variant is found, okay? And over here would be the patient has no phenotype whatsoever that fits with the gene and the mutation that was found. And so if you have a pathogenic mutation or a pair of pathogenic mutations for a recessive condition, you would say that's a definitive positive. This patient's phenotype matches perfectly. It's a pathogenic or two pathogenic mutations. That's the answer, definitive. If you have likely pathogenic or perhaps one likely pathogenic and one pathogenic, you've got a probable answer. It's almost certain that you've got the right answer but maybe you'd like to make sure that this other variant is truly pathogenic. If you have any uncertainty about the variant level evidence, then you have an uncertain result. It's a possible answer but it's not convincing. And of course down here in the benign and likely benign, you essentially have negative results. So where it gets confusing is where do you draw the line here? How far into the degree of phenotypic match are you willing to go to say it's a definitive answer? People will speculate about expansion of phenotypes, right? You find a gene mutation in someone who's got partially overlapping phenotype. Maybe that's just the lower end of the spectrum for that particular condition. We just don't know that yet. So there's some movement towards how much phenotypic match do you need to have to call something a definitive or probable diagnostic result versus a secondary finding. And there's this big chunk of variants of uncertain significance that really are in genes that have very little overlap with the phenotype that we don't report. And so this is sort of the complexity of thinking through the results at a clinical level. And I'll skip through this. This just deals with the single heterozygote variant and how you might think about that. And again, just to sort of reiterate, this makes it very difficult to validate clinical sensitivity without having tens of thousands of known positives, tens of thousands of known negatives where you phenotype to everyone, you can run all of your pipelines and be able to show how often you would get true positives, true negatives, and so forth. How much time do I have? Am I out of time? Okay, so I better not give my examples and maybe I'll come back to them in the end and we'll maybe get to those in discussion. All right. Thank you, Jonathan. And we'd also like to thank our over 100 people who are joining us via webcast and let you know that if you have questions, please submit them and we will attempt to get to all of them during the question and answer session. Next we have Dr. Sharon Leong, who's a regulatory scientist in the Division of Molecular Genetics and Pathology in the Office of Invitro Diagnostics at FDA.