 So the topics for this afternoon is somatic point mutations. So relative to large-scale structural alterations in the genome that encompass maybe kilobases or megabases in size, this afternoon we're going to focus on the small events that are nucleotide level changes, such as this gene-to-tea mutation that happens to be in P53, which is a tumor-spresser gene, and inactivation of P53 gene mutation allow cancer cells, for example, to evade program cell death and DNA repair. So the first thing I want to focus on is there are certain patterns of mutations that accrue in genes across different tumors that can tell us something about the impact of a mutation in these genes. For example, showing here is the PI-3-conase protein, and each one of these glyphs represents a mutation in a specific tumor, a patient's tumor, and so you can see that the distribution of these mutations and PI-3-conase is highly focused on two parts of the protein, so one in the helicase domain and the other in the kinase domain. And so when mutations across a population pile up on top of each other like this, there's a strong indication that there's something important about mutations in those amino acids that will give a phenotypic advantage to cancer cells, and so they get selected for independently in different tumors. And so these are known as what are called hotspot mutations. They typically are a change of amino acids that alter the function of that protein. So we call them gain-a-function or switch-a-function such that the protein mostly stays intact, but the DNA-based change at a specific codon induces an amino acid change that creates a new function for the mutant protein. Another example here is IDH1 isocitrate dehydrogenase in glioblastomas, and you can see just how specific mutations that are found in this gene really are. They all happen at the same amino acid location in the protein. So these are hotspot mutations that typically result in a change of function. On the bottom, these two genes have a much different pattern, so the missense mutations that induce amino acid changes are accompanied often with a loss of function mutation or truncating mutations. So these are mutations in the DNA that induce a premature stop codon in the protein, and so the protein is truncated and often degraded, and so it doesn't have its functional, its full functional repertoire. So here you can see the distribution of mutations across the RB1 gene is actually throughout the protein, and that's because there are many, many different ways to inactivate a protein through mutation, and as long as the protein is inactivated, then that's the important part of the mutation distribution. Similarly another example is the von Hippolindau gene kidney cancers will exhibit these types of mutations in von Hippolindau, and you can see again that there's a spread of the truncating mutations throughout the protein. So these are really two important classes of functional mutations, and they have really distinct patterns of how they accrue in tumors across populations. So with that in mind, you all know by now that there are several massive initiatives underway to try to find these types of mutations, so where you have recurrent hotspot mutations or loss of function mutations to identify new tumor suppressor genes or new oncogenes. And so the International Cancer Genome Consortium and the Cancer Genome Atlas project were really designed for the purpose of it in an unbiased way by sampling across the whole genome could new mutations like this be discovered, and it's really to try to explain the biological properties of cancer that are depicted here in this hallmarks of cancer paper by Hannah Ann and Weinberg. And so these are the sort of phenotypic attributes of a cancer, and we want to try to be able to explain how those cancers acquire those phenotypic attributes through mutation of the genome. So this is a synthesis of the initial phase of the TCGA whereby 12 cancer types were analyzed together to try to have a look at what is the landscape of mutations across these 12 cancer types and I think this comprises about 3,000 different cancers. And the idea here was to look at genes that were mutated more often than would be expected by chance, and then to show the distribution of mutations across these different tissue types. So in this figure it's probably very hard to read, but the genes are the rows, the columns are the tumor types. And so one thing of note is you can see immediately there are some genes that are recurrently mutated in many different tumor types. So this is the P53 tumor suppressor gene and you can see that by far the heat, the color of the box here represents how often that protein is mutated in that cancer type. So there's a lot of red across all cancer types here and it suggests that really P53 is the most heavily mutated gene in cancer. These are patient tumors, yeah. And then so here's an example of a very highly tumor type specific mutation that, so there's the von Hippelendau protein and it really only occurs in kidney cancer. So there's something about the cell type of origin that when von Hippelendau is mutated in that cell type, that creates a malignancy, but probably not in the other tissue types. Okay. Mangeoblastoma. Sorry? Mangeoblastoma. Mangeoblastoma. Okay, thank you. So here's an example of, this is KRAS here. So KRAS is highly mutated in colorectal cancers and then we have genes like PI3 kinase which are also exhibit point mutations across the spectrum of tumors. And then here's APC which is specific again to colorectal cancers for the most part. So what's interesting about this type of analysis is that a lot of the, so these are, the genes here are grouped by biological function, okay. So for example, you have genome integrity, you have TGF beta, you have PI3 kinase signaling, MAP case signaling, et cetera. So these are pathways that we knew a lot about prior to the TCGA, but what was really quite surprising is that some biological functions such as histone modification and even global properties of splicing are actually affected by mutation in specific genes and given tumor types. And so the impact of mutations that are responsible for epigenetic genome integrity, for example, the chromosomal structure impacting is a whole new theme that's emerged in the presence of identification of mutations in core histones, in chromatin modeling complexes, and other functions that are involved in how DNA is organized in a cell. And so this is a whole stream of activity from understanding mechanism through to therapeutics that's been triggered by these discoveries that really wouldn't have been, I don't think, obvious prior to unbiased screens across the genome. And similarly for splicing, recurrent mutations and splicing factors, for example, in certain tumor types is a key result that has emerged in the presence of these large scale unbiased studies. That's not pancreatic, that's pan cancer, it means across the spectrum. So yeah, yeah, yeah, so this is like a, you know, synthesis, yeah, okay. So the probably, it's hard to see, but P53 probably has the strongest signal error. Yeah, yeah, across, across everything, that's right. Colors of frequency? Yes, right, colors are frequency within the tumor type. Okay, so, so in terms of clinical utility, there are, there's lots of clinical utility for knowing the status of particular, the mutation status of particular genes. Some mutations will define a cancer, so if we have a high grade serious ovarian cancer that's lacking a P53 mutation, then we get very suspicious of whether it's been misdiagnosed, for example, because it really should be in every case. There are lots of emerging companion diagnostics for targeted therapeutics, the most common of which are to test for EGFR mutations in lung cancer for anti-EGFR tyrosine kinase inhibitors, for melanomas, there are tests for specific mutations in BRAF for administration of Vymaraphanib and other BRAF inhibitors. In colorectal cancer testing for the presence of mutation in KRAS codon G12-13, and now actually codon 61 as well, and H-RAS and NRAS being added to the list are contra-indications for administration of SITUXMAP in colorectal cancers. There are a number of other indications and contra-indications listed at this website here. Then of course mutations have important, mutations also accrue and emerge as potent markers of drug resistance. For example, secondary mutations in EGFR and lung cancers in the presence of anti-EGFR therapy have emerged, so the T790M mutation in lung cancers is a recurrent and very strong indicator of resistance to that treatment. Now there's actually a clinical test that's been approved in Europe and I think will soon be approved in the US to test for T790M as a contra-indication for anti-EGFR and for an indication to switch therapies so that those patients don't experience relapse. So it means that if the mutation is present then the patient likely will, the tumor will not respond to the therapy. So it shouldn't be, it should not be administered in the presence of the therapy, that's a contra-indication. And then, so we talked a little bit about PARP inhibitors in ovarian cancer this morning. So in terms of just standard of care cis-platin therapy, BRCA revertent mutations are seen in a subset of cases whereby you may have a germline loss of function mutation that either results in a truncated protein due to a frame shift and in the tumor cells, in the presence of therapy a new mutation can arise and that restores the function of BRCA. So the vulnerability that's being targeted by administration of cis-platin is then no longer present in the cancer cells and so they stop responding to those therapies. So these are important mutations that have clinical significance and utility. Okay, so the main classes of point mutations in the coding regions of cancer cells are missense mutations where a single base substitution alters the amino acid sequence of a protein. So that's a pretty straightforward, so of course we know that the coding of amino acids is organized into codons which are three base units of the genome and there are multiple different ways in which a missense mutation can occur within a codon, but basically that's the definition of a missense mutation. A silent mutation is when there's a base change in the genome that does not result in an amino acid change and so silent or synonymous mutations are considered really just background mutations that don't have an impact, although there's some evidence to suggest that some mutations that we think are silent or synonymous actually do have some sort of impact and it depends often on the splice isoform that you're looking at. We have a non-sense or truncating mutations where a single base substitution introduces a premature stop codon and that results in a class of a loss of function mutation as is a frame shift which can be a small deletion or insertion that changes the reading frame of the transcript and will result in a premature stop codon mutation and maybe one more class that's not included here is a splice site mutation that has a similar effect. So they can be sort of grouped into loss of function mutations and change of function mutations. Yeah. Just a sort of curiosity question, since you guys are doing all this cancer evolution work, has there been any evidence to show that you can have silent mutations but because you've made that switch make it easier to make another switch down the road? Oh yeah, that's a really good question. So the interaction of mutations and the evolutionary timing of mutations, I think there are a lot of people working on that question. I don't know if we have evidence that that's true but I think everyone sort of believes that's probably the case. So some mutation might prime the cell for the eventual acquisition of a mutation that really changes the screen of time and so you have a necessary but not sufficient condition. So here's a really interesting case example of discovery as well. So this is just Dan, I pulled out of the C-bio portal for gliobastoma. Have you looked at C-bio portal yet in the workshop? No? No? Okay. And are they doing that? Is it C-bio portal part of the curriculum? No? Okay. All right. So C-bio portal, you can Google it, it's a very nice interface to the TCGA results and there are all kinds of ways to navigate the data through a web-based interface and this is just one of them. So all I did was plug in essentially glioblastoma and IDH1 and you can look at essentially all the instances of IDH1 mutations that are in the database. And so then this plot is this lollipop plot essentially shows where the mutations are occurring on the protein. I showed an example of this already. But this is an important discovery in glioblastoma. So isocitrate dehydrogenase 1 is a metabolism gene. And so what was really intriguing about this is that this is a mutation that was recurrent. You would have guessed that there would be mutations in this protein. And the nature of the protein really has opened up a whole new investigation into the role of metabolism in these and other disorders. And what's remarkable about this is that these mutations were discovered in 2008. Earlier this year the mechanism of action of this mutation was published by Brad Bernstein's lab and it beautifully shows that in the presence of IDH1 mutation there's a consequent and concurrent amplification up-regulation of a platelet drive growth factor receptor alpha. And what's very interesting about this is that these mutations are essentially mutually exclusive to cases that have amplifications in that platelet drive growth factor receptor alpha gene. So this is just a different way of up-regulating platelet drive growth factor receptor alpha and provides strong evidence that IDH1 mutant tumors likely will respond to inhibitors to that up-regulated protein. So a really nice story from discovery through to mechanism through to potential treatment options all within the span of a decade and clearly the result of unbiased sequencing screen of different tumors. Really potent example. So another example slightly less impactful just due to the rarity of the tumor type is this example here. This is work that I was involved in and so we were studying rare forms of ovarian cancer and we were sequencing actually the transcriptomes of these cases and noticed that of our index cases they all exhibited exactly the same C to G mutation at this particular locus. This is just showing the C DNA. You can see that all tumors have this C to G mutation here. That results in a dramatic missense amino acid change and then when we profiled a large set of additional cranial locus cell tumors the ovary we found that essentially every single case had this identical mutation and it's now it's what we would call a apathemonic mutation that essentially defines the disease and some of these cases can be some of these non-epithelial ovarian cancer can be difficult to diagnose and so this provides a molecular precise diagnostic that is the defining feature of the disease and is now actually implemented in a couple of different countries as a molecular diagnostic. So it's a transcription factor and it regulates differentiation of these cranial locus cells and it's really only expressed in that cell type. So again this is probably why we hadn't seen it before in any other tumor types just because people hadn't studied this really rare form of ovarian cancer. So I talked about PI3 kinase activating oncogenic mutations. This is just another example of these so we have mutations accruing in the different domains of PI3 kinase and there have been lots of targeted inhibitors in cancer PI3 developed. They tend not to work extremely well unfortunately but this is an active area of research as to how to target these mutations. I mentioned KRAS code on 12 and I mentioned BRAF v600e, I think I can skip this. This just shows that essentially these mutations in PI3 kinase they drive downstream signaling of AKT and so this is the pathway that is affected by the mutations accruing in that particular kinase. So you have different phosphorylation states that accrue as a result of a mutation of that protein. Okay and then zooming in on a tumor suppressor loss function. Yes? Well yes and no, I mean it will probably be slightly wired up slightly differently in every cell contact so maybe even in every tumor but this is a keg encyclopedia of pathways and essentially it's a curated pathway set from an evidence based in literature. So has anyone taken cells that actually have, let's see which of those actually changed expression? They have, yeah. So there's, Luke Cantley is probably the discoverer of this pathway and its relevance in cancer and he has dissected this pathway quite robustly and there are probably 10,000 papers on this pathway, I guess, something like that, maybe more. Alright so this is the pattern of loss of function mutation so this is arid1a, we discussed this this morning. This is a paper from David Huntsman's group and so again we sequenced a number of index cases and found that there were these, this very characteristic pattern of loss of function mutations in the index cases and then we sequenced through the gene in a large number of cases around 300 and found again this pattern of mutation that spread throughout the protein indicating this is most likely a loss of function mutation. What's really quite interesting about this paper is that for some of these cases the endometriosis associated ovarian cancers represent a large percentage of the non-hygrine cirrus ovarian cancers and so constitute about 20% of ovarian cancers and for some of these we had the pre-cancerous lesions in the tumor bank and so we looked for the presence of these arid1a mutations in those pre-cancerous lesions and sure enough they were there. So it really suggests that these are potentially very, very early mutations and again maybe satisfy this idea that they're necessary but not sufficient for malignancy. So they somehow acquisition of these mutations is very early and then additional mutations accrue that drive the malignancy of the type. So this is just a table, I actually have already talked about all of these things. So this is a bit of a duplicate slide but maybe a little bit better presented than the previous slides but this just shows some of the important mutations that are currently being tested and for which there are targeted agents and on-label indications for which a clinician can prescribe, make a prescribing choice based on the knowledge that the mutation is present. And so BRAF V600E, Vemeraffinib in melanoma, metastatic melanomas are effective in patients with unresectable melanomas, widespread metastatic cases have been shown to respond really quite beautifully. Ultimately probably they all will relapse with some resistant disease because it's such a potent and specific target to select the pressure that these agents invariably select for resistance. But nonetheless, and there's an example right here that I've already discussed is the T790M mutation EJFR in these XN19 deletion or L858R mutant EJFR lung cancers. And then we talked about the contraindications of K-RAF. So this is the big paper that described the response of BRAF mutant tumors to Vemeraffinib. This is shown with a waterfall plot here that essentially shows tumor growth. And you can see that each one of these bars here represents a specific tumor. And these are BRAF V600E mutant tumors that were treated with Vemeraffinib. And you can see the majority exhibit a response of some kind where they risk and showed tumor shrinkage. Whereas with a standard of care or another drug here, the response wasn't nearly as uniform across the spectrum. So this is great and this shows some real promise for metastatic melanomas. However, one has to use this with caution. So BRAF V600E mutations also occur in colorectal cancers at a pretty high rate. And so there was some degree of excitement that perhaps this compound could be used to treat colorectal cancers. And they might exhibit a similar response to what was observed in melanomas. So that's what this PLX is essentially Vemeraffinib here. And what shown in this plot is a tumor volume of xenograft models whereby over the course of days you can track tumor growth after administration of the treatment. So you can see that after ten days the treatment was started to administer. And so the control and also Cetuximab and Vemeraffinib all show a lack of response to treatment. And so this compound on its own actually ineffective in BRAF mutant colorectal cells. So the cell context is important. And the reason is because in colorectal cells they express EGFR. And so it becomes an escape valve for those cancers. And so you suppress BRAF and then it up regulates EGFR. And so it's like kind of like a whack-a-mole idea. So you whack one mole and it comes up somewhere else. And so then what this group did is essentially test combination of Vemeraffinib plus anti-EGFR monoclonal antibodies and showed that the tumors were responsive to the combination. So the mutation alone is often not sufficient to predict response to a targeted agent. We need the cell context and we need to know what the expression program is of the cell context. Okay. So another relatively new discovery so 2013 not too long ago was the description of mutations in promoters. So these are functional elements of the genome and so it's not surprising that you might see mutations have an impact if they accrue in the promoters. So prior to this almost every significant mutation that had been described in literature was in a coding region of a gene. Okay. So this paper revealed that both in familial and sporadic melanoma there were recurrent mutations in these that created ETS transcription factors, ETS family transcription factors, at these very specific loci in the genome. So here's the start codon here and so it's upstream of the start codon in both these cases. And so it's not surprising that you might see them. Then there was a in the very same issue of science, these are back to back papers. This showed that the impact of those mutations was to change the expression of particular proteins and so that's essentially what's described here. So the mutations are not only recurrent but they actually functionally impact the expression of those of the genes. Okay. So that gives me an example of I think gene-based analysis of mutations. So but let's think about now going beyond single genes. What is the pattern of mutation across the genome tell us about the biology of the mutations and the biology of the cancers? So there are two properties that we can ascertain through whole genome or whole exome analysis. One is the mutation rate. So how many mutations does a tumor have and is that some sort of indication of underlying biology? The second is the mutational signature so what processes may have given rise to those mutations? What are the mutagenic processes that are operating those tumors and can those processes tell us something about the nature of the cancer? So there are really two main efforts in this area that are significant and one is from the Broad Institute. This is the Lawrence paper from 2013. Something funky about the reference there but the DOI is here. You can look at the paper here and what this shows is a number of important properties of how mutations accrue. So the way to read this plot is that from the center to the outside represents essentially the abundance of mutations in a given tumor. Each dot is a tumor, a specific tumor that has been sequenced and then around this donut as they call it you can see this distribution of the types of mutations that accrue. So these black dots are melanomas and you can see that the melanomas are primarily characterized by harboring C to T mutations. So can anybody think about and also a lot of the cases are at the outer edge of the donut here. So what's going on here? Why do melanomas have C to T mutations? Yeah, right. So exactly. So what UV exposure does to DNA is it creates a deamination of cytosines and creates C to T mutations in the genome and so these might be people that have been exposed to the sun in ways that may be in ozone depleted regions or where they have lots of exposure and so it's a rapid accumulation or an accumulation of mutations due to UV exposure. So how about here, so yeah. So we tell the UV exclusively should be normal people if they're close to the same amount of sun, have a sedation in the skin? Yeah, so I think it's very well understood now that probably all of us are walking around with mutations in our normal cells. That's not unexpected. And that there's actually a recent paper I think last year again from Peter Campbell's group when they looked at skin that was removed from eyelid surgery, so really protected skin and even in those cells they harbored a lot of mutations. So you can imagine that other people exposed to UV would definitely harbor mutations but of course it's a function of how well you protect yourself if you go out in the sun. So some people wear sunscreen, some people don't, some people have more natural pigment, other people don't, etc. Well, the question goes to how specific is that for melanoma? Well, it's highly specific to melanoma. That's what this plot shows. So you don't see, so the colors are the tumor types. And so in this zone of this plot, you really only see black dots. Yes? I think it's the relative distribution, so these ones would have also TPC, some of the mutations would be TPC, but less so C to A, so they're shifted towards that. So then we can look at these red dots and these are lung cancers. So we have lung adenocarcinomas and lung squamous cell carcinomas. These are C to A mutations, so take a guess what's going on here. Sorry? Something has something to do with cigarettes. Yeah, yeah, so that's what cigarette smoking does is it creates C to A mutations in the genome and in a couple of papers that have been published that compare lung cancers that are from lifetime smokers versus lung cancers from never smokers. The number of mutations is on the order of 10 times more in the lung cancers of smokers. So it's direct evidence that you remember the insurance battles from the 1980s. I don't know if any of you have read Siddhartha Mukherjee's book, A Biography of Cancer. I highly recommend to read that. It documents essentially the battle that people undertook to prove that nicotine or that smoking was a cause of lung cancer. And really this was the evidence that was always lacking. We needed this evidence back then. It would have been useful and probably advanced the field by 30 years in terms of prevention. And nonetheless that's what's going on. So these two examples are examples of exogenous factors that can really reveal what was the mutational insult that led to these cancers. And then we have a number of different spectra as well. I'll go into those in a bit of a detail. So then we can look at actual mutation load as well. So here we see that the cases with the highest mutational load are lung cancers and some uterine cancers up here, some colorectal tumors out here. So this is important. This has now really come to light where the number of mutations is actually a very potent indicator of response to immunotherapy. So immunotherapy was the single biggest topic at ACR this year. I don't know if anybody went. And it's really an emergent topic that is really dominating, I would say, the new progress in the field. As any kind of new exciting thing is probably overhyped, but at the same time, one of the really exciting areas is that these cases that exhibit high numbers of mutations, usually through mismatch repair deficiency, are the cases that respond well to immunotherapies. And that's because they're presenting lots of neoantigens to the immune system to target. And so this is a sort of a genome-based biomarker, if you will, that can stratify cases into treatment arms, immunotherapy. This is not necessarily related to cancer. Oh, yeah, yeah, yeah. So these are all somatic mutations that are not present anymore. Yeah, so that's exactly what the class of compounds would be, immune checkpoint inhibitors. But there are also other classes of therapies like CAR T-cells and also actually just culturing the patient's own T-cells for patient-specific vaccines. Sorry, one last question. Yes. It's more on disease models as well. Both, yeah. But the real plenary sessions are all on patients. So there is a myriad of clinical trials now that have been started up to test the efficacy of immune checkpoint blockade inhibitors and using, for example, a mutation load as a biomarker. Yeah. Yeah. Yeah. What is the use of using? That's uterine cancer. Yeah, so remember actually we can go to here. So here we have, this is that same figure that I showed where, so there's a real big, this is the same data plotted here. So there's a real range. So these are the poly mutants are these ultra mutants up here. This is a log scale, by the way. So these are the poly mutants. These are the mismatch repair deficient cases, which are this class here, MSI. And then here you have the copy number cases. So I think we've discussed really the endometrial carcinomas and so I don't know if you're going to take a look at that. It's, in this case, they did a number of really kind of heuristic algorithms. I mean, it's not, there isn't a machine learning algorithm that's been applied to this to develop the actual decision points really was kind of a heuristic algorithm that is is this right here. Okay. All right. So now to how do we detect mutations? So in NGS data, so again, we can visit this idea that cancer tissues are not homogeneous in the sense that they also have normal ad mechs cells as part of the tissue. There is considerable heterogeneity in terms of the number of cells that harbor particular mutations. There is considerable genomic instability as we saw this morning with copy number changes, loss of interest, idosity, genomic rearrangements, et cetera. And then what's unique about cancer is what we're interested in are mutations that are in cancer cells that are not found in the germline DNA. So capturing somatic mutations. And so this is a new experimental design that can be accounted for. But so the main process is that when we receive data that comes in the form of short reads, these get aligned to the genome. We're sort of covered this already. And essentially the idea is that we can reduce this data to a set of allelic counts. And it looks something like this. So we have the normal genome and we have the tumor genome. And we can start to look for regions where there are variants and again reduce these down to allelic counts. So these are, this is the number of reference bases. And this is the total depth. And so we can take this matrix, compute these counts, and then leverage some statistical models on top of these counts to try to infer somatic mutations that are present in the genome. So for example, here's a case where this is a homozygous polymorphism. So it's different than the reference, but all the normal cells have this G to C alteration. And so the tumors naturally have it as well. And so we just count this as a germline polymorphism. This one here is a heterozygous germline polymorphism. So three out of the seven reads harbor a G to C. In the tumor it's three out of eight reads. And so there's shared signal here. And so again we think of this as a heterozygous germline variant. This red signal is really what we're after. So here we have reads piling up. None of the reads have a variant in the normal. So the reference is A. All the reads show an A. But in the tumor we have three out of six reads that have an A to C change. And this is what we call a somatic mutation. And so now the trick is to try to infer these events and ignore or flag these blue events differently and also try to avoid being tricked by artifactual noise in the data. We'll go over that. So we can develop statistical models for this. We've done a lot of this type of work. And the key idea here is that we can leverage the fact that most of the signal, the variant signal that's present in the tumor will actually also be present in the normal. So the number of polymorphisms in a given individual range from, let's say, 1.5 million to 3 million. And the number of somatic mutations is on the order of 5 to 10,000. So the actual somatic signal is actually dwarfed by the germline variation. So we need to have robust models that can distinguish the germline variation from the somatic variation. So we've done a lot of work in that area. So validation in the early days revalidating these mutations was really key because there were a lot of poorly understood noise properties of the sequencers that would essentially trick us into producing signals that look like they were real mutations. And so we underwent really extensive experimental validations in our early data sets. And the artifacts can be a result of many different factors. So this is an example where these reads actually were somehow misaligned. The top track here, has everybody seen IGV? Yeah, have you seen IGV? Okay, so I don't need to explain this. It's great. So the top track here is the tumor data from a sample. And the bottom is the normal data. And for some reason these reads are not tumor-aligned to this locus, but in fact they're misaligned. They should be aligning somewhere else. And this was specific to the tumor cells. This is actually a false positive. Although the signal looks like it's there in the tumor, it's not there in the normal. This is a false positive. How do you find out if those maps were okay? Yeah, so we actually aligned it again using a more robust aligner and found that it actually should belong somewhere else. Is there a real reply? Yeah, so it's an imperfect process at this point because the complexity of the problem is such that it might have a billion reads from a whole genome experiment that needs to be aligned to the genome. So you have a billion, a hundred base pair of reads that need to be fined. So it's a computationally massive problem, so each alignment has to be done extremely efficiently. And so that comes at a cost of specificity. So that is an issue. So if we could spend more compute time, we would do a better job. But then we would be waiting for a month to get our result, or longer. Can I ask a question? Is it simply that you just take back, figure out what those reads are, pull those reads out specifically to the real line? No, so this is retrospective. This is looking back. These are all positions that we re-validated and didn't find the mutation again. And so we try to look for explanations for what's going on here. And I'll show you a more systematic way that we exist that. Again, another really important problem, and again this is due to alignments, is the presence of insertions and deletions. So there's a structural rearrangement here, a structural variant that's causing misalignment of these reads. And so the gap is being inserted in the wrong place. And these are being read out as variants when they're really a result of misplacing of the gap in an insertion. So that's a big one. The actual base collars of these machines have a probabilistic element to them. So in a similar way that the array hybridization works, there's actually an image analysis step at each cycle of the sequencing process. Did you go over the sequencing process at all, or in aluminum machines? Yeah, okay. So in the image capture step, small fluctuations you can imagine in space on the coordinates may result in uncertain calls from the image analysis. And so that's encoded. The certainty is encoded. And so some of the signals are weak and could be the result of low base quality. It's difficult to see here, but IGV renders these as the brightness of the letter as a function of the quality of the base call. And so these are just low quality base calls that are probably the wrong base call. Yeah? I think it's a degree to which a signal can be isolated from its neighbors as a factor in that. Color intensity plus the, yeah, exactly as the bleed through from adjacent clusters. So then another really important indicator of false positives when all the reads come from the same strand, and this is typically the result of a PCR issue in the bridge amplification stage. So that can be accounted for as well. And then there are some artifacts that, you know, we just can't, in this case the tumors in the bottom are almost on the top. And this one really was a confounding example. We just couldn't understand why this was not seen again. And it could be, in fact, in this case the validation experiment was not correct because all the signals point to this one being true, but there are some unknown mysterious false positives. And so here are some true positive examples that show you how difficult the problem can be. So this is an example where only a small percentage of reads actually harbor the mutation. It's a real mutation, but the signal is very weak. And this is very likely the result of a minor population of cells harboring the mutation. And this is sort of what it looks like, is that the majority of reads won't have the mutation, and so the signal is buried in a large space of no signal. If you're looking at the signal cell, so if you find the cell with the mutation, then the signal should be clear, but then it's the problem of actually sampling the cell. So if it's a rare... No, I'm just wondering the technical aspects. Yes, yeah. We have a clone of the cell that should... You should be able to get it. Yes, exactly right. Yeah, so you can think of each of these reads representing an allele in a mixture, in the pool mixture. And so we're just... And that's also a sampling. So you have this pool, you have this bag of alleles, and you're reaching in and grabbing some and sequencing them. And so we can still have lots of allele-like under-sampling. So a population that's, let's say, one in a thousand cells is very unlikely to be ever seen in the depth of coverage that we're used to in whole genome sequencing, like 50X, very unlikely that we'd ever see it. So there's a sampling bias in bulk as well as a single cell. So since you mentioned that one in a thousand you wouldn't see a 50X. Yes. If you want to see something that is one in a thousand, what X would cover? Yes, well, so there are power calculations that you can execute to calculate that. But it's... You know, they never... A plan never survives contact with the enemy, right? So in reality, it's difficult to actually to determine that. The theoretical estimates are there. You can make the power calculations. But then you have all kinds of steps along the experimental protocol that can introduce bias into that. Nice, clean theoretical argument. So, but there are... through binomial tests or Poisson sampling you can calculate precisely what you think the coverage would be if you want to have a 99% probability of sampling that allele. That's doable. Okay, so this one... It's a similar concept. It's very rare and buried in a sea of noise. The signal here is barely perceptible. Okay, so our approach to this has been to try to use machine learning-based classifiers to learn the properties that yield false positives and learn the properties of these data that yield true positives. And this is a method that called MutationSeq, which we developed, which is based on calculating features of the data. So at each locus we can calculate a number of different properties such as what is the average base quality, what is the average mapping quality, whether they're presence of nearby insertions and deletions, what is the strand bias, and then to borrow strength across the two datasets we can actually compute combined features such as what is the difference in allele counts between the tumor and the normal, et cetera. So what this shows is that is a principal components analysis of taking on this very large feature space. It's actually a 106-dimensional feature space reduced to here in the three dimensions that clearly shows separation is possible between somatic germline and wild-type events and that these were all called... If we only looked at allele counts, which is the first thing I introduced, then all these mutations would be called. But you can see that the majority are actually either false positives of these wild-types or they're germline events that couldn't be picked up by allele counts alone. And so by expanding this to this feature space and using machine learning-based classifiers, we were able to really start to get accurate results and improve the validation rates by large percentages. Yes? So germline is a polymorphism, so there's a variant there, but it's there in both the normal and the tumor. And wild-type means it's just basically there's no variant there at all. It's a complete signal artifact. So germlines are true signals, but they're signals that are shared between the tumor and the normal, whereas the wild-types are just completely artifactual. So you can think of germline as a biological false positive and a wild-type as a technical false positive. You were feeding the lead sequencing to the classifier. Yeah, both data sets are being fed to the classifier. So the results of this are in this paper here. But basically the idea is that calculating these features and using a discriminant classifier is a really potent way to eliminate false positives. So this is a receiver operating characteristic curve plotted sensitivity against one my specificity. And the most accurate you could be is up in the upper left corner here. And so it's not perfect, it's much, much better than, for example, standard tools that don't take these features into account. Yeah, so RF is random forest, Bayesian additive regression trees, support vector machines, and logistic regression. So the type of classifier is less important than actually just calculating the features. Yeah, which is actually a common result. The feature calculation in most discriminative classifiers is actually the most important thing in what you present the right data to the classifier and any reasonable discriminative classifier will do a good job. Okay, so, yeah, I'm going to skip over this. So, okay, so let's move now into, so let's assume now that we can reliably detect mutations and you'll use a tool called Strelka in lab and this was developed by Illumina and it performs similarly to MutationSeq. We've done a lot of benchmarking against Strelka. They have slightly different properties but more or less will produce similar results. And it's a nice piece of software that's available from Illumina. And so I think, are you doing axomes in the lab? Yeah, so you'll actually execute Strelka on a tumor normal axome to find mutations. So, yeah, so let's say now that we can detect mutations. So how do we interpret their importance and what are the different properties that we should look for? So, one of the more robust measures of whether a mutation is important is how often does it occur in the patient population? Now that's not a strictly straightforward calculation because in fact one of the most frequently mutated genes in all of cancers is a gene called Titan. Does anybody know what's interesting about Titan? Yeah, so it's a huge gene so it's got lots of genomic real estate and lots of opportunity to accrue mutations. So one has to normalize against that and so a lot of methods, for example, like music, this paper here, have endeavored to account for different properties that would yield mutations by chance. And so timing of DNA replication is important so where in the cell cycle the genome is being replicated and also the length of genes and the sequence composition bias, et cetera, et cetera, all factor into these calculations. But the basic idea is that if a gene is more frequently mutated in the population than would be expected by chance then that's a good indicator that may be an important biological gene. Yeah, so there are cell line-based assays that have been determined. So for different tissue types or different cell types it's been experimentally determined which parts of the genome were replicated along the full timing of full DNA replication. So those models can be used and incorporated into these types of calculations. So that's incorporated into gene burden analysis to see so is it that, I mean where it is, is it influenced by mutating proteins just with that size? Yeah, so the issue is that at the end of the replication cycle there are fewer nucleotides left in the pool and so towards the end of the replication cycle you get more mismatches that are incorporated and the result is unavailable nucleotides. Yes. Yes, that's a great point. So if you have concurrent RNA-seq data one can look at express mutations and certainly for neoantigen presentation that's important so you want to make sure that mutation is at least expressed. And then for function though it's not as straightforward because so through nonsense mediated decay truncating mutations for example can be missed in transcriptomes simply because these transcripts are being degraded. So those are functional mutations that result in a loss of expression but you would never see it in the transcriptome because sure, sure. And so the paper I described earlier this morning when I was talking about joint analysis of expression and mutations essentially leverages that idea in a systematic way. Okay, so in the population frequency is an important measure. One more question. It's really when a gene elopist characters that are physical or chemical in practice makes it more prone to mutation it's actually could actually serve as a factor to down-regulate the importance of that mutation in cancer creation of cancer or its progression because it could simply be produced by chance. It is worked that way? Yes, that's the principle. It basically controls it. So it will down weight the contribution of those mutations. So then we can look at the clonal dominance of the mutation as well. So this is a landmark paper by Marco Gerlinger and company from Charlie Swanton's group in the UK and what this group did is executed regional sampling of kidney cancers and they took multiple tumor foci and sequenced in this case the exomes of these different lesions and then plotted the x-axis here of the mutations that were seen and then the y-axis is each sample that was sequenced and so the gray box in this matrix represents that that mutation was seen in that particular sample and so we can look at this matrix and start to characterize mutations. These are mutations that were present everywhere and very likely represent the ancestral mutations that are at the top of that phylogenetic tree that I was referring to earlier. So that's what's shown here. And then you have mutations that are lesion-specific that definitely represent mutations that must have occurred later on in evolution and characterize specific lesions and so the implication here is that clones that are unequally distributed in different regions very likely will have different behaviors in the context of therapy and so this is an interesting spatial experimental design. We've done similar work which I hope to get to today in ovarian cancer that really describes how this can be used to potentially learn something about how tumors spread in different anatomic sites as well. So interpretation of mutations from the perspective of whether they're... Charlie likes to call these trunkal mutations that are on the trunk of the tree. These represent features that are likely present in all cells and if we were to target something and hope to target a mutation and hope that all cells would respond equally then this is the place to look for such mutations. So then returning to this idea of mutation spectra and processes that lead to different substitution profiles. So this is work by Mike Stratton's group published in Cell in 2012 where they characterized breast cancers as having arisen through acquired mutations through different endogenous processes. So in particular this profile here which you can see the mutations so the way to read this is the y-axis is the frequency, it's like a histogram and then each of the bars represents a particular trinucleotide context so the mutated base and the pre-seeding and the anti-seeding base and then accounting the number of mutations that occur in each of those bins. So there are 96 bins across here and you can see that they're decidedly non-random patterns so this process here where you have enrichment for C2G and C2T mutations this is a pattern that's known to be caused through an enzymatic process called deamination through apobac enzymes and so this is a 5-methylcytazine deamination process that converts C bases to either Gs or Ts and so this is interesting because it's an endogenous process so unlike the UV exposure or the lung cancer smoking exposure this is an endogenous process that is existing within cells and it's like the result of up-regulation of apobac enzymes and there's some thought out there that potentially inhibiting apobac enzymes would potentially reduce the probability that additional mutations would accrue in these cancers and so that's an active area of research. And then we have signature C where is it? There's a particular process that is specific to BRCA tumors and it's a signature that's consistent with homologous recombination deficiency and so again using the genome as a biomarker idea the pattern of mutation across the genome might be an indicator of particular vulnerability in the cancer and so the HRD signature may very well be used as a way to stratify cases on the park inhibitors as we talked about this morning. Okay, so that's another way of interpreting these mutations and then this is now sort of the second part of that gene expression paper that I mentioned this morning and this is really looking at what is the impact of a mutation across a pathway? Okay, so this is the uterine cancer TCGA data and these are all cases shown in columns that have a beta-catenin mutation so mutation in CTNN B1 and this is a known recurrent mutation and the activity of this mutation is also known to constitutively activate wind signaling pathway the wind signaling pathway and so the genes here are shown are wind signaling pathway genes and then the heat encoded in this matrix represents increasing red means up-regulation of the gene expression of that gene and increasing blue means down-regulation of that gene and so importantly all of these cases are beta-catenin mutants but you can see that only three-quarters of the cases actually have this constitutive activation of wind signaling and this set of cases here, maybe 25%, don't exhibit the same expression profile so interpreting the activity of mutation with the gene expression tells us that it's very likely that even though these are beta-catenin mutant cases they likely don't have the same biological properties as these cases and if we dig a little bit further we can see that actually a lot of these cases are the poly mutants so beta-catenin in this case is probably a result of a passenger mutation so just because there are so many mutations accruing these cases beta-catenin just happens to be one of the genes that are hit but it's not actually inducing a biological change and then some others are MSI cases as well and so this is really important because as we move towards panel-based testing for drug indications for example through DNA alone it's clearly we'll miss some important signals that could be relevant to understanding whether a patient will respond to treatment regimen so in this case there aren't any targets for that inhibit when signaling but if there were then you'd want to know certainly the difference between these two patients and beta-catenin alone may not be sufficient to do that I mean the values like the signatures that you can see these are identical or very similar to Alexandra it is the same this paper came out just before I think they're very close I mean it's the same group so this is actually breast cancer specific paper whereas Alexandra was pan cancer some of the available tools we've talked about SAM tools is a nice set of suite of tools that are really good for manipulating data sets in SAM and BAM format this has really become a standard in the field is using BAM format and there's an API and it's implemented in C and it's nice and fast and efficient you may have heard of the genome analysis toolkit in the broad the initial paper is here it's implemented in Java it's a bit slow to run but it nonetheless has some interesting properties local realignment being an important step that can be executed within GTK and what that means is that the presence of those indels that I was showing you this tool can then essentially map those indels and then take reads and realign them back the genome taking into account the presence of that indel so you get much cleaner alignments at those regions Mutect is probably the most widely used somatic mutation caller this is also from the Broad and has underpinned a lot of the TCGA a lot of TCGA mutation calls and this is just showing a comparison and they claim that it's very sensitive to low prevalence mutations and then here's Strelka this is from Charles Saunders and from Illumina it shows the workflow this is our contribution which is mutation seek and so we have a standalone Python package and some visualization tools for plotting these trinucleotide substitutions and some of the distributions of alleles that we observe just skip over that so it's worth then talking about how do we encode mutations in a format that is workable and so the standard format it's really a community standard I would say it's not a technical standard it's a community standard called VCF is the standard and it essentially contains eight tabbed limited fields and the precise definition is listed here at this website and so you'll learn about VCF in the lab and what's interesting is that we can compute a lot of attributes about the data again which can be used to then do some post processing to filter out events that look like they may be poor quality so basically this is a tabbed limited format where each mutation is represented in a row in this text file and this can manipulate it using a myriad of different tools including VCF tools is a package or any kind of spreadsheet or any kind of statistical program that can parse out columns in a tabbed limited file and so that's the standard this is an example of the header for mutations you can see some different parameters that are used some of the commands that were used to create the results file and you can see the header will always start with two hash marks and so these are lines that at the very top of the file that will be ignored by any tool that wants to process the data but it gives you a human readable well sort of human readable more or less human readable description of what was done to produce the output are quite useful it tells you also how to read the info field and the info field can be structured in many different ways and so this tells you exactly what the different fields in the info field look like and so this is what it looks like so you have chromosome then you have position and you can have an ID for the given position and that's just represented the null or default you can see dots it's the reference allele and and then you have the quality I was missing one header here for some reason the reference allele and the alternate allele should be here there's the quality score whether it passes a filter and then the info field which has a different attribute so here if we go back here you can see PR is the probability of somatic mutation at that particular locus for example and you can read the details in this description so here's a list of tools and some visualization tools IGV is often used to just review manually the mutations that are found it's a good practice to get into the just reviewing some mutations especially if you think they're important or you want to follow them up experimentally it's really good to validate with some degree of visualization and then a number of annotation tools so once you've predicted mutations in this format you just have basically a list of 10,000 variants in a tumor of course you want to annotate them for biological functions packages available mutation assessor ANOVAR, SNPF being three of the applications presumably following you using SNPF SNPF so in the lab you'll use SNPF so you'll go through the process of taking essentially raw BAM files calling mutations and then annotating them with SNPF and you can see that whole process manually so let's see how long do we have for this session does anybody know okay so we're supposed to be done soon okay so I'm going to just briefly show you some recent results so it's telling me it wants me to stop stop okay so my lab is particularly interested in studying evolution of high-grade serous ovarian cancers and so we've been taking this idea that these cases are such that they represent 70% of ovarian cancers they often respond initially to standard of care cisplatin based therapy but almost 80% will recur and the other feature is that often at diagnosis there may be widespread disease throughout the peritoneal cavity and maybe have lesions on different organ surfaces of the peritoneum so we want to really understand whether what is first of all the nature of the spread and ultimately can we learn something about clones that develop in different parts of the peritoneum as potential for acquiring properties that would lead to treatment resistance so this is work that was spearheaded by Andrew who did really a magnificent job of taking very complex data and computing over these data sets to try to infer something about the biology of ovarian cancer spread so what we did is collect the sampled material at primary debulking surgery and executed a battery of really deep genomic interrogation including whole genome sequencing targeted deep sequencing and single cell sequencing to identify constituent clones that existed in these different regions, measure their abundance and then map the clonal spreading patterns and so with it I don't have too much time to explain this but I'll just maybe skip to so the key message is that here what we're showing is we can infer a clone phylogeny similar to the one that I showed in the very first slide and so I showed that for a reason so that we could come back to this idea of how is the phylogeny useful so what we can do is once we have the phylogeny we can then start to measure what clones are present at which levels of abundance in different parts of the anatomy and so what's shown here is all the different samples that we measured and we can see for example that this part of the phylogeny was present in this case in non-ovarian sites primarily so we have an omentum site here, a small bowel site and then also we think that this tumor actually is a bilateral tumor and so these lesions are also present in the left ovary and left lopium tube now these clones however on the other end of the phylogeny were very specific to the right ovary and so you can see that in this bilateral disease the clonal composition of disease although they're related to clonally actually are quite different and so this incredible variation in terms of what constitutes the disease and again this is all at time of diagnosis so the repertoire of clones that's present at diagnosis unequally distributed in anatomic space and then we think that this all emerges from a single diverse site that is characterized here by the left ovary that has representation from both parts of the phylogeny so the green bit and the yellow bit so there may be a site that's permissive to lots of diversity and then everything radiates out from there so one last anecdote and then seven so it just came out sorry? correct so it's essentially there was no specific collection protocol it was we tried to study all specimens that were extracted and labeled at the time of default surgery and so why is this relevant? so here's a case where we had a temporal sampling whereby we collected samples at the time of diagnosis there was a first relapse and a second relapse so there's a brain metastasis here and then a secondary relapse back to a utero-sacral region and so this sample here at the time of diagnosis is actually diverse and represents you can see that there are constituents from this part of the phylogeny and then also a yellow clone and a pink clone and this is significant because the first relapse is composed entirely of this green clone present at diagnosis already but not the yellow or the pink whereas this relapse here is characterized entirely from this part of the tumor so all the clones that actually characterize the relapses in this case are the resistant clones present at diagnosis we can learn something about what gives rise to resistance at diagnosis and we may be able to intervene in a way that would prevent those relapses in the first place we did do single cell sequencing okay so I think that's it it's been a long day a number of people here in this work I presented some of whom are in the room Fong, Andrew and a number of the students in the lab and a lot of the funding that underpins the work I do and also of course I think none of this work is possible without the patients who are involved and donate their samples for research it's important to acknowledge that as well and thanks for your time and your attention and your questions and I'll be around for a little bit and then I gotta catch a plane fire away with your questions yes yeah so so I had done some work in sampling breast cancer over time so we published a paper in 2009 that compared a relapse specimen from nine years in between the primary and relapse and so that was an interesting exploration of what clones what are clonal dynamics can be observed in time then what is the initial substrate and the full clonal repertoire that is present in diagnosis and so I remember pathologists and surgeons at the BC Cancer Agency and realized that this was something that was particularly interesting in ovarian cancer simply because it's so widespread at presentations there's an opportunity also through the debulking process to actually execute this study in a feasible way the material is there it just needed to be collected amenable to study and so we started that protocol six years ago now and now I've accrued well over 35 patients where we have multiple samples yeah that's an excellent question so I was going to spend some time on clinical panels and so I'll try to answer the question by Dina so there's a spectrum of offerings from commercial labs right from hotspot mutations where we're only looking at these gain of function mutations which are targeted inhibitors highly actionable but minimal genomic real estate so very effective assays through to the other end of the extreme which is foundation medicine for example foundation one foundation one has on the order of 500 genes I think 3 to 500 genes and they do full gene sequencing of those 3 to 500 genes it's a much more expensive test so it costs anywhere between 3 and 5,000 dollars per test so it's a cost point where the system wouldn't be able to bear that certainly in Canada and so that's the other thing that's important is that in order to effectively administer these tests within the clinical workflow the pathology workflow you would know this it has to be on FFP tissues and normal DNA just isn't available so it's not standard of care to do a blood draw for this type of testing so often a source of normal DNA is just not accessible and then the tumor tissue itself is formal and fixed paraffin embedded material usually sometimes worse than that fine needle aspirants or other poor quality material and so these assays have to work on this very very suboptimal material to do this study this is like pristine material that's cryopreserved it's meant to be studied rigorously with DNA based assays but in morphological in histological diagnosis what's most important is preparations that preserve that cell shape and so that's what's available in the clinical workflow so foundation one we have a company called contextual genomics that also works off FFP and there are a number of other offerings there are probably about 30 to 40 companies in the space now across North America and Europe that are offering these panel based tests and they have they cross the spectrum from maximum actionability of results so where only on label indications are used all the way through to full gene testing where the majority of the variants that come out would not have an actionable an action that can be taken by the clinician and so the interpretation of those mutations is very difficult so they will search against all of this again action of mutation and selective so there will be different classes of mutations that are reported so the actionable mutations will be reported out first where there are on label indications in colorectal cancer it's a standard test is it individual to verify the mutation by negation of the branch oh I see so the algorithm is usually part of the certification so the analytics is part of the certification of the assay and so that's usually done through proficiency testing and through sample exchanges between different labs so you have orthogonally annotated samples where you know the mutations are there and then you perform the test on those samples and you make sure that those same mutations can be recovered and that there isn't there are a lot of extra mutations that the other test didn't find so there's proficiency and analytics validity that needs to take place for that test to be accredited and use in clinical practice is it true or is it still an open source or is it too old? no, sometimes proprietary sometimes it's a mixture it's a lot and in many ways it's still an evolving it's still an evolving field the informatics is not yet stabilized nor is it certifiable in the ISO type standards that would usually be administered for other tests and devices VAST foster application tools that could have to be the best yeah, so that's from M.D. Anderson right? oh, in Utah, okay I'm sorry, yes I'm not familiar with it yes so you have some so the brain was composed entirely of this clone here and the others were composed of for example, the leftover site it's completely yeah, completely separate oh, yeah so what we have done is we have validated on the order of 3,000 mutations in the initial through resequencing so we take the same DNA pool we targeted PCR and amplify and resequence so we had labels that had criteria associated with them so we saw the variant above 5% fraction this is with lots of coverage, so 10,000 X and we called it real sub 5% we called it but above 0.5% we call it sort of undetermined and we use that and if it was 0, then it would be whopping microRNA, oh that's a good question I'm not I'm certainly not an expert in microRNA but all the TCGA papers for example, explored microRNA so for a lot of the the tumor types that explored in the TCGA repository you can actually download and study microRNA distribution microRNA clearly plays a role in ventilation so it's the important molecules which is not way in mind so has it still reached anything close to the importance of what we know about them or the sort of copy number well, so microRNAs can be affected by copy number and they can also be affected by mutations and so actually there's some Marco Mara has done a lot of work in trying to relate microRNAs and mRNA expressions and there's some some work in that area what I was trying to probably should ask it differently these mechanisms of adding to the function or silencing tumor suppressor gene or all these mechanisms my understanding is that we don't log any work out or we can currently explain to where genesis is a lot more with those mechanisms than by control of transcription I think it's a fair statement, sure I think it's probably just because it's under explored and I would make a similar comment to methylation and epigenetics as being clearly fundamentally important to the biology of cancer cells but but are now beginning to be explored in rigorous and robust ways and we've learned a lot and then there's a whole other dimension which is learning how cancer cells interact with the microenvironments and the normal in the cell is a bit of a nuisance when statistically interpreting mutations but in fact, of course, they have lots of biological influence on the malignancy itself in terms of immune surveillance, in terms of stromal epithelial crosstalk between cancer cells and fibroblasts etc. Yes, we may have the patient plot to show that Alexandra thinks by those synonymous and non-synonymous together? Yes, so they're typically all mutations, so the best way to do that type of analysis is to look at all mutations in the genome because you have more statistical power and because you're chopping all the mutations into 96 bins, you want to make sure you have enough representation across the whole spectrum for the pattern. And then the mutational signature how do they know what's 19 signature how do they know which one? You have lung cancer, but bladder cancer, this lung is tobacco related as well and that's not in the same cluster. Okay, I didn't realize that. Yeah, I'm not sure. Last question, in multiple cancers do you have the bulking which improves outcome? What do you think, hypothesis-wise, do you think the benefit is there to decreasing the availability of residual disease? Oh yeah, I think there's no doubt about it that one of the most confounding factors is minimal residual disease after debulking. So if there's parts of the disease that can't be extracted, then that is a factor in terms of relapse. So effective surgery is going to be a very potent indicator of relapse, I think. What are you going to say about the medicine that you need to deal with several things? Well, in metastatic disease that's inoperable typically. Are you talking about an ovarian cancer? No, I do kidney treatment. People all the time. Kidney or ovarian or anything? Yeah. So you rule for primary, but you still have medicine to be developed? Yeah, so sure, I mean it's metastatic disease is the problem that it's the problem that people succumb to metastatic disease. So being able to treat metastatic disease is probably the fundamentally most important challenge. Yeah, great. Yes. I agree that with the cancer that you did single cell analysis. One, how many samples were analyzed and how much was the success rate of capturing lab cells in the micro-pages? Yeah, that's a good question. So we did, we looked at three different patients and multiple samples for each of these patients. In total, I think it may have been 10 in some odd samples. I can't remember exactly, it's documented in the paper. And so what we were able to do is across those, that number of samples, we captured 1,600 cells but then each assay was patient specific, so we picked mutations that we wanted to resolve, essentially at a single cell level. And so we did the fluid on the access array that we used at a time was 48 by 48. So 48 cells and 48 loci. That's the dimensionality of the data. So it was definitely variability in the efficacy of in capturing specific loci. So some loci just dropped out. We didn't get any data from, so that's a PCR failure that didn't work. And then some cells were also complete duds. So we had captured material and the cells but didn't produce any product. And so these are some of the things that happens across both dimensions of locus and cell. You can get no data, partial data in terms of, it creates a missing data column. I think it's highly dependent on the teamer. So some tumors dissociate very nicely and there's probably a biological property there. Others don't. In this case, we did actually nuclei. And that seemed to work best for these tissues. But then you have the risk of the nuclei no longer protected so they can degrade and get into the product. So it's still, I don't think there are specific answers. It was variable from sample to sample and from case to case. Some samples we had to try repeatedly and just never got to work properly. Whereas the other samples worked on the first time. So there's something, there's a property in the tissues that probably give rise to that. I don't know what it is. I mean the lack of getting good cell sensing and at least what I'm trying to understand is the rest of my career is on it. You don't get all the cells in all the biological villages thereby a lower cell. Sure. And it can be unequal sized nuclei as well. All right.