 So today we're going to be discussing the role of analytic validation in next generation sequencing tumor genomic profiling. And let me congratulate all of you who have dialed in for such an exciting appearing topic. And to stress how complicated this topic actually is and the fact that we're going to spend some time together beginning to scratch the surface of the intricacies is just very applaudable. And I hope that after this conversation it will inspire you to ask questions and continue to dig. So today what we're going to cover is an initial overview of the technical challenges of doing tumor genomic profiling and why this is particularly germane to the specifics and extent of the analytic validation that each individual lab performs. And we're going to go into depth on exactly what we mean by analytic validation. And who in the regulatory community currently is evaluating the quality and extent of each lab's analytic validation. And then some clinical implications and then some questions at the very end that hopefully will guide your discussions in the future. So let's frame up this discussion with a clinical case, which I think really emphasizes what we all hope to see within our lifetime, the promise of personalized cancer care. And here are two patients who have diffusely advanced metastatic melanoma and who have failed all forms of conventional therapy. So the concept in personalized cancer care is that the tumor cells depend on abnormal signaling and growth for survival. And this is related to genomic changes, changes at the DNA level, which have driven these cells to become different from the normal and well-behaving cells in their body. So the first step is to identify the genes of interest that have been mutated and that are potentially producing the protein target. This is where the diagnostic test comes in and is so important in discriminating next steps for patients. Step two is then to treat with small molecules that inhibit abnormal pathways or hit the achilles peel of the tumor while preventing eating. I'm sorry, there's a lot of noise on the line. Treat with small molecules that inhibit the abnormal pathways within these cells and spare the patients a vast majority of side effects. So in this rubric, this patient had an accurate test that identified genomic alteration in each case, which allowed the prescription of a medication that specifically targeted their tumor and resulted in a dramatic change in the course of their illness. So in this case, the rubric of diagnostic tests predicting a specific therapy worked well. But this is not as simple as it might appear, because we have not just one drug or one genomic alteration, but we have over 150 intracelular targets that we know have compounds in the pharmaceutical industry in development that will be hitting the clinics either in clinical trials or for use in actual treatment of patient cases in the next five to 10 years. So getting diagnostic pairing correct has never been more important. So we've now established that there are molecular technologies that are moving into the clinic to predict responsiveness to drugs, and that patients' positions are going to rely on these results for clinical decision-making. And with this complexity of potential therapies, labs and healthcare as a whole have moved towards multiplex technology to more broadly assess the genomic drivers. And this has added a level of complexity that we've never experienced before to effectively divide patients into the most relevant groups for whatever clinical intervention, which may actually mean withholding a particular therapy when a particular driver is not present. In order for this rubric to be successful, we have to separate two very important concepts. The concept of technical variability that happens at the level of the assay, this must be minimized so that we can begin to understand the inherent biologic variability present in each patient's tumor. And these two sources of variability are not able to be separated as rubric is not as effective at predicting what patients should go on a particular therapy. So let's take a look at an example of a next-generation sequencing assay workflow. So people talk about next-generation sequencing and sometimes think of all of these tests just because they employ this technology known as MGS as one uniform type of test. But that's not the case. So this describes only a single test that's in the middle of this sequencing of this workflow diagram. It is enumerated here as the Illumina High Seq under the word sequencing in the middle. The assay itself is actually comprised of a number of upfront processing steps that starts actually at the moment that a specimen is procured in an intervention in an outside hospital and doesn't end until an informative clinical report is issued. So what we have to capture here is there's multiple steps. It's a highly complicated workflow with multiple tests that each have to be validated and understood. And in particular, there has to be an appreciation for the preanalytic variables that are outside the control of the MGS laboratory. These include the fixation of the specimen, the procedure that was used for collection, the age of the specimen, the storage conditions that this is an archival piece of tissue. And this needs to be evaluated so that each individual assay understands the impact of each of these on the results that will be eventually delivered. So the second level of issue that we need to kind of address that is part of the complexity of a validation scheme is that there are not just one or two analytes that need to be evaluated. This is just a selected list of a number of genes that have been recognized and that the presence or absence of a mutation directs the patient to a particular therapy or away from a particular therapy. The issue that confounds this list that we're adding to every day and the scientific and medical literature is adding to every day is the fact that any one of these genes listed there can be altered not just in one way by a point mutation that people typically think of when they invoke the word mutation, but can be altered in four discreet and different ways. So here we have a cartoon which illustrates one mechanism, which is copy number alteration, where the gene sequence itself is totally normal, but we just have instead of two copies in the normal diploid cells, have more than two copies. This is the case that occurs in her to amplify breast cancer that we know is a target for perceptive therapy and other drugs. The second mechanism is base substitution, and so this is again the most commonly considered alteration, and this is where one area of the sequence or one base in the sequence is changed to another one, and this leads to a change in the coding and the amino acid of a sequence of the eventual protein. The third category of alteration are insertion deletion events in which small areas, particularly potentially regulatory areas are either duplicated or deleted, or possibly insertions or deletions that are in non-multiples of three, which cause frame shifts in the coding sequence of the protein that eventually leads to early termination of the protein and a lack of a complete sequence being produced by the cell. And finally, rearrangements, where two pieces of the chromosomes interchange with one another, forming chimeric proteins that function in novel ways and sometimes drive tumor cell growth. And this is the case for the EML for alpha fusions in lung cancer that you, I'm sure, are aware of. The third complication to this entire process is that the testing needs to be performed on the clinically available specimens that patients are having collected in the course of routine care. And as we've moved towards smaller and smaller biopsies with minimally invasive procedures that have less recovery time and complications associated with the sampling, the tests need to accommodate for this lower input amount of material as well as the routine processing that occurs in a diagnostic pathology lab, such as formal infrexation and the effect that it actually has on the nucleic acid within the cell. And finally, an analytic validation has to address and understand the fact that in routine care, sampling of a tumor is often dominated by the background normal cells from the patient. And that only a small fraction of any piece of tissue that is collected has tumor in it. And so what we see here, represented from an experiment that we did in our laboratory, was to look at the relationship between the purity of the tumor, i.e., the relative proportion of the extracted DNA that's coming from the tumor cells, because what you have to understand in this type of an affin, all of the DNA from all of the cells that are present in a particular sample are being extracted together to form the input that's being sequenced. And the frequency or the sensitivity that different techniques have for detecting alterations is affected by the amount of tumor that goes in to begin with. So in this example, if there's almost 100% pure tumor, a heterozygous mutation, like a dominant base substitution mutation, a KRAS alteration, could be detected as it would have a mutant allele frequency of 50% in a 100% pure tumor, meaning half of the DNA coming from this tumor would have this alteration. And the sensitivity in a standard capillary sequencing assay, this is the older type of stanger sequencing, would only be 93% for the detection of this in 100% pure tumor. If we drop down to a 40% pure tumor where the mutant allele frequency would be 20% or less, the old version currently considered the gold standard would only have about a 55% sensitivity for detecting these alterations. In reality, what we need to do is to be able to detect alterations in a very low tumor purity and mutant allele frequency. Or at least to understand the performance characteristics of an assay to know when we have to caveat the results that would be negative to also include the concept that it could be a false negative because of the mutant allele frequency following below the limit of detection of the assay. How are we going to address these challenges in oncology next generation sequencing and genomic profiling? Especially in light of the fact that there are numerous different tests being used by different laboratories. So how do you assess the ways in which they are different from one another? Well, especially since they may have different genes that are being analyzed, they may have different amounts of each gene that's on the test being assessed. They have different approaches to enrichment of the particular genes that are on the assay, PCR versus hybrid capture. And each of these approaches has implications for the sensitivity and specificity of the test. Which instrument is being used? This is kind of the concept of which box do they have? From which manufacturer? And then which types of those four mutations can be detected and in what clinical context? So on top of all of this, how do we know that one of these various approaches has value and can be trusted to provide accurate and reproducible results for a patient population? Well, here's a comment from the NIH DOE Task Force on Genetic Testing, which says the reality is that there is no assurance that every laboratory performing genetic tests for clinical purposes meets high standards. So with this framing, what do you should do? Well, a lab should start in order to understand the product that they are producing and that's the results that they'll be delivering with an analytic validation. So what is this? This is a process by which you determine whether an assay is able to discriminate the presence or the absence of an event that it was designed to detect. And these have two basic components, a measurement or an assessment of accuracy versus precision. So accuracy you can envision this as dark being thrown at a dark board. So accuracy is how often does the dark hit the bullseye and how close is that? So that is described by measures such as sensitivity, the ability to correctly identify patients who have a disease, specificity, the ability to correctly identify the patient who don't have the disease. And then based on the prevalence of a particular condition in the population, the positive and negative predictive values of the test. So in this particular patient with a positive result or negative result, how likely does this reflect the actual status of the patient? Precision, on the other hand, is the concept if you are throwing darts, how well do the darts cluster, even if they're nowhere near the bullseye. So this is the measure of how much random variation there is in a test. And it's described by reproducibility and repeatability. Why does analytic validation matter? Well, 70% roughly. Somebody has been heard to say medical decisions are based on the diagnostic test results of one kind or another. And these results stratify patients into subsets, which get very different types of interventions or counseling. So the analytic validation helps assess the reliability of the data that's being given to the clinician, which is feeding their medical decision-making. So right now, who evaluates analytic validation? Well, in general, there are few organizations that provide this assessment and licensing. But in the area of next-generation sequencing, there is no single standard or guideline which regulates what is the gold standard for an analytic validation. So we're just going to briefly go through these agencies to describe their role in this environment of regulation. So CLIA is the minimum bar that allows a laboratory to deliver tests which will prompt clinical decision-making. And they're charged with ensuring accurate and reliable test results. They inspect laboratories on an every two-year basis, and we'll do a review of both the tests, which are FDA-cleared and approved and being utilized in the laboratory, but also the laboratory-developed tests. So these are tests that the lab has either assembled from other or modified from other FDA-cleared or approved products, or has generated entirely on their own to meet a clinical need. So within CLIA, there are no minimum thresholds that must be met specific to NGS testing. The second regulatory body to consider is the College of American Pathologists. So this is a credentialing agency that laboratories can voluntarily subscribe to for inspection and accreditation. The inspections are also performed on every two-year basis, and mainly these checklists try to assess the quality management and quality control of lab testing, personnel, and lab safety. They have recently added some molecular pathology-specific checklists with sections that address next generation sequencing, validation, and the ongoing QA and QC. But the recommendations are fairly broad and open to quite a bit of interpretation. The next regulatory body is New York State Clinical Laboratory Evaluation Program. This is a license which is required by every laboratory that wants to perform testing on New York State residents. And they have elevated the bar for licensing of molecular tests and next-gen sequencing to acquire a New York State license. And it's currently considered one of the most rigorous certifications that a test can go through outside of the FDA. Recently, the Palmetto MOLDX program has also, in this lack of sort of clear regulatory guidance, established some components that they will be evaluating in their technical assessments, specifically around the analytic validation of NGS-based tests in order to qualify them as covered tests. And these components include sensitivity, specificity, and precision. And when a test has gone through this technical assessment and it's been deemed to be covered, they'll be listed on the MOLDX website. And finally, FDA. So FDA typically is informed of tests and the performance characteristics before a test goes to market if the test is going to be FDA-approved or clear. However, they have been practicing enforcement discretion with regard to the laboratory-developed tests for many years. And in reality, most genetic and genomic tests are not FDA-approved products but are lab-developed tests. And because of this and the implication for the pairing with FDA-regulated drugs, this is a keen area of interest and has led to a draft LVT guidance that was issued in October of 2014 and a diagnostic test workshop that included a number of thought leaders just about a year ago to advise the FDA on how they should proceed. So the next two slides just show some comparisons between the New York State MOLDX and CAHPS guidance around things that you would want to consider in a validation. So New York State is the only one that specifies how many clinical specimens they want to see results on for licensure and this is just 50 specimens. The analytic sensitivity and specificity requirements by MOLDX includes assessment of a limit of detection to be established for the minimum amount of DNA that is input into a test. New York State also wants an assessment of the lowest mutant allele frequency that can be detected by a particular test. And this is going to become more and more pertinent to oncology specimens as we continue to have low tumor purity samples where the risk of false negatives is very high and as we proceed into an era where patients will have exposure to targeted therapy and will subsequently develop subclonal alterations that are resistant mutations, which will indicate that the patient should cease receiving a particular drug. Some other things that are specifically addressed in these guidances include precision, the stability of the sample and reagents, reference intervals, and some quality control issues that need to be in place. And then the differences between these guidances are listed here and only New York State and CAP have established key performance metrics for the entire process from that beginning stage of extraction through data analysis. And as we already touched on, the lower limit of detection has been called out specifically as important by MOLDX. And finally, and sort of surprisingly, the only group to address a positive or sensitivity control is the New York State guidance. And this is really important when we think of other lab tests and the fact that we would be routinely doing a test without including a positive control for the assay seems pretty improbable if we were thinking about a chemistry or a CDC or some other blood test. So let's look at an example of an NGS validation for a complex next-generation cancer genomic profiling assay. So here's an example, and this is the test that we run at Foundation Medicine, where we're assaying 315 genes. And the claims that are being made around this particular test include that the full exonic coding sequences for the entire 315 genes are going to be assayed and there's statements about validated accuracy, coverage, and amount of input tissue. So how did we establish this? Well, we'll go back to this picture here of the next-generation sequencing assay workflow to say that in the absence of a regulatory environment that prescribed what to do, we have the opportunity and resources and the personnel here with the background to set up an extraordinarily rigorous validation that allowed us to understand our test performance characteristics in a way that is very important to the quality of the data that we're delivering. So in brief, EMA and or RNA is extracted from a block or a slide, the formalinfect paraffin-embedded tissue. This step and the preanalytic variables that happen in the collection lab were evaluated for their impact on downstream processes and extensively optimized, as I'll show you in the future slides. The next step is the CNA that is extracted from the sample, is made into what we call a library. This is the genomic DNA that represents a mixture of all of the chromosomes, all of the DNA content that is present in both the normal cells and the tumor cells in that initial block of tissue. Now, how do we focus in on those 315 genes out of the thousands of genes that are possible in this library? And this approach is through a technique called hybrid capture. So this is one of several sequence enrichment approaches that can be utilized. And again, this step was optimized and validated. Once this wet chemistry part of the assay is performed, it's loaded on what we all think of as the box, the Illumina High Seq. And then the sequence that comes out, the other end, has to go through computational biology processes to call out the different mutations in any one or all of those genes that are on the assay. And each of these analytic pipeline algorithms also have to be validated for their accuracy in being able to pair the output from the sequence or trivial events in a particular sample. And then this needs to be matched to a reference sequence. So even if we detect a difference in sequence, it needs to be matched to a database that describes the normal human variation versus those that are seen in tumor cells. And this leads eventually to a clinical report being issued. So the next few slides are going to show you some data, not because I'm going to spend very much time at all. In fact, I'm only going to fly through these very quickly, but to understand the complexity of the type of evaluation that needs to be performed in this type of evaluation. So here is a first slide which shows the impact of DNA extraction before and after optimization. And what this scatter plot shows is this is a 100% pure breast cancer tumor where no optimization has been performed. And if you don't know what you're looking at, it just looks like a bunch of dots. But when you compare it to what 100% optimized sample preparation shows, you can see that now we see discrete bands within this chromosome extracted from this tumor. And this allows us to reduce the amount of tumor in the sample from 100% down to what's more clinically realistic, 20%, and still pick out the loss of an important gene that's seen in lobular breast cancer known as CDH1. As I mentioned, we have to validate a variety of different things in this testing world. We have to be able to call out any alteration type, substitutions, insertion deletion, amplification, and homozygous deletion, as well as fusion at any position in the 315 genes, which is over a million bases of individual coding regions, and be able to detect this at any mutant allele frequency from one to 100%. So you can imagine the complexity of designing a positive control that would allow us to push the assay to let us evaluate all three of these parameters. And so there isn't a patient sample. There isn't a reference sample that can be purchased or wasn't at the time that was complex enough to evaluate all of these parameters simultaneously. So what we did is we created a pool from a variety of cell lines where the DNA mutations were very well known, and this allowed us to model somatic mutations. The beauty of the cell line approach is that there are tumor cells and matched normal cells from the same patient so that by combining these in different ratios, you can understand the performance of the assay down to very low or high mutant allele frequencies. So these next few slides just underscore the numerous, numerous experiments that we did to show that we could detect mutant allele frequencies of less than 5% of the total DNA across a large number of different genes covered in the test. And we also looked at the performance over various amounts of median X-ray coverage to understand where we needed to put a quality control cutoff so that if our test wasn't performing on a particular day up to this specification, we would repeat the test rather than releasing a potentially incorrect result. We repeated these over time and looked at the correlation between the measured mutant allele frequency and what was expected based on the proportion of the normal and the abnormal cell that were added to the mixture, and we saw a linear relationship. We also did these same sorts of experiments for cell lines that had known insertion and deletion events to confirm we could detect these at different mutant allele frequencies. And we repeated this for copy number alterations. So cell lines with mixtures of different homozygous deletions or amplifications of particular genes were also challenged in this way and repeated so that we understood that when we had a 20% tumor fraction and a gene was amplified at eight copies or more, we had a sensitivity of detection of 93%. And this went up to 100% if our tumor fraction was more than 30% of the total DNA we were extracting. This allows us to confidently give results on specimens that are gauged at a tumor content of 30% and to qualify results for patients who have a tumor fraction of less than 30% if we find reasons to believe that it's possible that a copy number call may have been missed in a particular sample. We also tested our platform again, other tests that were available on the market, such as Piquinone. There's a large variety of FFPE samples and looked at the concordance between the calls of both of these panels. There was 97% overlap here, or overlap of 97% of the mutations, with a few more being called in NGS. And then we looked at the additional mutations. The ones that were in this area detected by NGS and not by the Piquinone were the ones at lower mutinomial frequencies. So likely are true calls that were below the lower limit detection in the orthogonal platform. We also tested this against fish and IHP results with excellent concordance. And ran multiple experiments to look at the reproducibility between the sequencing results from the same specimen in inter- and intrabatch comparisons. And we did this reproducibility over time, so months and months and months. 79 and 71 replicates of two different tumor samples where we knew what the alterations were that we were looking for. And every time we called the exact same alteration and we called them almost the same mutinomial frequency, which is the little bit of zig-zagging of the line that you see here. But in every case, all three alterations were detected. And this resulted in our ability to describe our analytic validation results based on sensitivity and positive predictive values across a range of mutinomial frequencies, i.e., the tumor, contents, presence in a specimen for all categories of alterations. And we didn't just submit this to CLEA or CAP or MOLDX, but submitted it to a group that had no stake in verifying these results other than scientific interest. So this was the submission and publication of the analytic validity to nature biotechnology. And the kind of scrutiny that the scientists have on these editorial review boards is much higher than the scrutiny that is performed at a regulatory level. And on top of this, they require you to submit all of the data so that they can go through it in a fine-toothed comb and make sure that you've drawn the correct conclusion. So if you go to this publication, this also has extensive supplementary data that includes the raw sequencing and mutant allele calls here so that anybody can draw their own conclusions about the validity of the test. So what are the implications to patient care? Here are three examples from our experience comparing in an ongoing quality assurance process internally for a subset of lung cancer cases. So we know that the MCCN and a variety of other guideline issuing agencies recommend both EGFRs and ALC testing to be performed on patient samples to direct care. So we were curious to see the samples where we have to identify an EGFR X on 19th elation, which is known to activate this gene. How many of these tests, or how many of these specimens were previously tested and what were the results? Did they agree with what we had seen? So we looked at a variety of cases. We had 250 of these where the pathology reports were available and reviewed them for the presence or absence of information, presence or absence of previous testing results, and this was available for 71 cases. And we identified that 12 cases had prior negative testing results which represents a 17% false negative rate. And you might say, how do you know that these were true positives and not some false positives that were detected by the assay? So the clinical information and treatment that followed for one of these patients supports these being true results. So here's a patient that benefited from empiric or latin therapy despite the fact that she had been given a negative EGFR testing result that fell into this category of a positive result by the MGS assay and a previous negative result. When we look at the less common alterations that are just outside of the classic range, 83% of these patients were missed by orthogonal methods. And again, here's an example of a patient that responded to EGFR targeted therapy. We repeated this evaluation looking at the cases that we identified as being out positive and found that similarly, about 32% of the cases we identified as being out rearranged had been previously called negative by fish testing. Most importantly, of these patients who were then subsequently treated with crissatine, 70% responded to therapy, which is the same response rate being in patients where the fish health results are positive. So again, these are true biologic positives that were fish methodology negative. And finally, here's an example of cases that were evaluated by a very well-known laboratory that had had testing for a variety of markers, all of the ones that are in NCPM guidelines, by a combination of modalities including hot spot testing and multiplex sizing assays for EGFR, HER2, K-RAS, VRAS, and a few other genes, as well as fish assays to identify ALC, ROC1, and RET rearrangement. And these patients were negative for all of these markers by the prior testing results. When these samples were run on the NGS-based profiling assay, a quarter of them had alterations that were within the genes recommended for testing by the NCPM guidelines. So a quarter of these pan-negative patients, the best in-class standard of care testing at the time, had alterations that had not been recognized. An additional 40% had alterations that allowed them to enroll in a clinical trial for a targeted therapy agent that was available at their treatment institution. So in summary, when you're thinking about the validation, you need to remember that the quality of the lab validation and their understanding of their performance characteristics very much impacts patient care. So some key questions you might consider asking of a lab that's presenting you with the possibility of a test is, number one, does the lab either have a peer-reviewed published analytic validation, or have they successfully completed the MOLBX tech assessment? If not, would the lab provide you with the raw data from their validation for review? Is the lab New York State approved? What were the validation specimens that were utilized representative of actual patient samples? Meaning, are they complex enough and reflect the low tumor purity of samples they're likely to encounter in clinical testing samples? Did they validate all types of alterations from various... Is that what they're going to see if we're going to go get the handout? That would be represented in clinical testing. Were the sizes of the validation set large enough and were the statistics appropriate to ensure narrow confidence interval? Was the entire process from an extraction all the way through reporting validated and to a degree that ensures reproducibility and robustness? If the comparator method was used, what was it? And is the data available? And finally, does this assay validation include intra-assay and intra-assay precision studies between different operators over multiple days?