 Okay, so this is I think officially the halfway point of the workshop, so I'm gonna take a deep breath You're over the halfway point All right very good Okay, so I wanted to to begin with This really quite interesting statement Nothing biology makes sense except in the light of evolution and and this was coined by famous evolutionary biologist Dobzhansky in the 70s It's really kind of in response to Educational policies in the US But I would like to just state that the same is true when we think about cancer biology, so And and really this has a historical context and I've already talked about Bowery and Noel and Hungaford and people have been thinking about cancer as evolutionary systems for a hundred years But but really you've lacked the measurement technology I think to to be able to profile evolution in cancers and I think we've getting to the very exciting time In trying to discern patterns of evolution in cancer and and and so this is really This idea has been synthesized in a seminal paper 40 years ago by by Peter Noel who was one of the co-discoverers of the Philadelphia chromosome and He placed cancer progression in the context of a phylogenetic treat, so if you think back to your high school biology You may have seen a phylogenetic tree that that describes the ancestral relationships between species over evolutionary time Well, one can actually cast a cancer in a very similar context where we have transition from a normal A normal cell and and and there's an acquisition of some sort of genetic mutation that transforms that cell to a malignant state and that that Clone we call a clonal expansion is a population of cells that have identical genotype Can then expand in different ways so different cells can then through stochastic processes acquire additional mutations and and that leads to this branching process and this theory is very elegant theory predicts certain very important features of tumors and the first is that tumors change over time We all probably appreciate that and then acquisition of mutations actually leads to Phenotypic changes that may insert certain circumstances Confer selective advantages and clonal expansions so you can imagine in the context of therapy where you've got a population of cells And with different underlying genotypes put the selective pressure of chemotherapy on that cell We'll drive that population through an evolutionary bottleneck and what emerges in the other side may be you know a very rare Clone that was in existence prior to treatment But is the thing that expands and ultimately maybe kills the patient later on so so understanding how this process works is really critical to to gaining insight into Whether cells are sensitive to chemotherapy or resistant And and the mechanisms why So so the the other important aspect of this is that we have a Way of interpreting mutations in the context of this tree so this theory really assumes that Mutations that are born at a certain part a certain time are actually Propagated to their descendant clones, okay So every time you have a cell division those mutations will be passed on to their descendants And so the clones that are born at at this point in the tree actually carry all of the mutations that would be present in that initial clonal expansion and so that tells us something about When in the process a mutation was acquired and it's it's potential for being responsible for example for tumor-genic Initiation in the first place so so casting Cancers in this evolution a framework has has a lot of advantages So one thing that's important to consider that is that we may have at diagnosis of a mosaic of cells that look something like that and the colors represent Different underlying genotypes, so you have different populations of cells with different genotypes and and over time that that May actually transform into something like this, especially if you have a trend a selective pressure in here that Then drives those cells through through an evolutionary bottleneck that has some advantages for this green population But but maybe it kills the the other populations So so so this this has major implications for how we might use a somatic mutations to study tumors And how they progress Any questions on this? So so we need to think about cancers in this way and And and then all of the genetic abnormalities that we're studying and trying to interpret Will will make sense from it will from this perspective Otherwise, so if you think about this statement, so you will sometimes we sequence or measure a genome There's a lot of chaos that goes on and and so to interpret it in the light of evolution It is very very helpful so another way to look at this is that you have this population so Here the cells are colored according to the genotype and and really want to have these fundamental questions are Arised in this context do do these clonal genotypes actually Drive different phenotypic behaviors and and very importantly from a clinical perspective is how this relates to treatment response progression metastasis, etc And and so this has been discussed in in numerous different venues We've done some of the early work in terms of profiling primary metastatic tumors and Comparing their mutational profiles to really measure evolution and there have been some review papers now that have come out This is a particularly good one from from bogelstein that that really talked about how the different levels of how Colonial populations can distribute in in different patients and also within a patient So this is a decent review to to look at that This was a seminal paper that came out in in 2012 that whereby this group is probably swatman's group at UCL in London did perform multi-region sampling of renal cell carcinomas. So they looked at the primary site here and then multiple metastatic deposits throughout the plural cavity and then sequenced the exomes of each of these lesions and and and what they found is is that there were a core set of mutations that were shared everywhere These really represent that trunk of the tree if you will okay, so this is the first set of mutations that gave rise to a Colonial expansion, but then over time each sample diverged in a way that where they acquired New mutations and these can be viewed as descending clones So if you think about this is the trunk of the tree these would be closer to the leaves of the tree Okay So this has many implications Most importantly, I think in the clinical domain is that each one of those rows represent a piece of tissue that could have been biopsied for let's say a clinical asset and Each one of those rows has a different mutational profile So if you want to say something make a conclusion about a particular patient and their tumor Sampling from just one place will definitely give an incomplete representation of the genomic profile of that of that particular tumor So so this has I think really really critical important implications Yes Yeah, so so that's the part of the claim of of this mutation. So here you have for example, there's a there's there are two independent P10 mutations in two different branches of the of the tree and and so here's a miss sense mutation It's the splice of mutation that but they're they're claiming is that that's actually an example of Of convergent evolution if you will or where by different branches the tree acquired different mutations But in the same gene, but none the less it's after that that driver mutation is is acquired after Initial the expansion of the initial clone. So it's not a tumor genetic event, but it is obviously still important Yeah, yes That is No, it's this block here The gray here means mutated Yeah, it's a kind of a funny color scheme. They picked but Yeah, it's it's the opposite of what you'd expect In our in our paper we put this around so we have a paper in that came out soon after this in ovarian cancer and Never understood why they colored it this way, but they did Okay, so the blue is actually the empty and the gray is where you have a mutation. Yeah, okay All right, so so this isn't this induces the so I'm glad you raised that question because it actually invokes this concept of driver versus passenger mutation so So who wants to take a stab at defining what a driver mutation is without looking at the notes Yeah Yeah, yeah, sure, okay Okay, good, yeah, so I think those two those two things that you just said are the most important so selective advantage and phenotype because phenotyping implies that there are some district characteristics that are that are associated with that are caused even by that particular mutation and then Whether selection acts on those characteristics or not it's probably of the ultimate arbiter of whether that mutation is actually having an impact And so and they're and those phenotypes can be different things They can be transformation from a normal cell to lignin cell. They can be Acquisition of metastatic potential so you could have cells that are in the primary tumor and then some some cells might acquire ability to migrate and Made that's a phenotypic advantage different phenotypic advantage may be upregulation of a Drug pump that allows cells to pump out toxins and therefore becomes immune to chemotherapy so they're all kinds of different ways by which a phenotype can be changed and and Depending on the selective pressure That what expands out I think has has some bearing on what happened so And then the other I think really important concept is is the if one when one studies traits in in species The the ultimate description of a phenotype is when you have independent Evolution of that same trait let's say in geographically separated continents That suggests that this idea of convergent evolution is quite important. So You know many groups are now starting to see that there are Along the branches of a tree one can acquire mutations In the same gene, but in independent clones and that's a really strong Measure of selection because then you have selection operating in independently twice and and selecting for the same type of mutation And so that that I think is is a really important concept to get across as well So that's a driver mutation and and the passenger mutations are generally considered as benign and And then it's just the opposite of that So so they do not alter the phenotype of the cell and so selection can't really operate on on passenger mutations because there aren't There there aren't phenotypes to to to select So and and passenger mutations we've discussed a little bit this morning They can be really the result of a particular Mutator phenotype so again a tumor with a mismatch repair Deficiency will accumulate many many point mutations Most of those are probably just going to be completely benign the vast majority won't be doing anything It's just a result of compromised ability to repair those Emissing corporations of nucleotides during cell replication and similarly Translocations that a crew in in cells with homologous combination deficiencies. They were just a crew and and and so You may not have they may not have any biological effect as well It's just a result of of of repair through through non homologous and joining for example Okay, so so these This is an important concept and you'll hear it over and over again You probably already know about this stuff. So let's think about these different types of driver mutations and and really this invokes the idea that Drive mutations can have a temporal aspect to them. So it's not just the tumor initiating type of mutations over cancer progression and these mutations can then lead to important phenotypic changes such as Acquisition of metastatic potential and and this is that this gives this paper here gives a very nice little overview of Mutations that are known to for example can for a chemotherapeutic resistance and the classic example is is in tire scene kinase inhibitors acquisition of kit mutations can for resistance In gastrointestinal tumors and and also and then this classic eGFR Codon 790 mutation Gives rise to resistance in lung cancers that are treated for tiresing with tiresing kinase inhibitors. So so I think The the role of a driver mutation just just to be clear can have a temporal Aspecting and be a tumor genetic driver can be a driver of chemotherapeutic resistance It can be a driver of metastatic potential, okay And and where these mutations are placed on that phylogenetic tree We can start to infer what type of mutation these these might actually be which type of driver mutations they are so So given all this information We're all probably engaged you wouldn't be sitting here if you weren't engaged in some level of probably sequencing or Or analysis of cancer genomes So so this is that the classic hand-to-hand and Weinberg Paper and cell that really described these hallmarks of cancer and these you could you could think of as phenotypes of cancer these are the phenotypes that are shared by almost all malignancies and what was glaringly absent from this is how these phenotypes are required and and it almost invariably is to do with changes in the genome and And that can be through mutation copy number even epigenetic change, etc With some exception of viral Exogenous factors like viruses that come in and and drive malignancies, but for the vast majority It's changes in the genome and so naturally sequencing the cancer genomes will will give us insights into their biology and There's some major initiatives underway like the TCGA where the first phase is largely complete and then the International Cancer Genome Consortium which is for which this Institute here is is heavily involved in and And so we will likely see somewhere in the order of a hundred thousand cancer samples sequenced to some degree In the next five years deposited in in databases, so that's a huge amount of data For which we will gain. I think tremendous insights into into cancer biology So here's just a synthesis of of this in action So this is a paper that came out at the end of last year that I was really describing the pan cancer analysis of the initial phase of the TCGA and what this figure here shows is essentially the the frequency of mutation in a hundred and twenty-seven genes that were found to be More mutated than you'd expect by chance in In these populations of different tumor types, so we have bladder cancer breast cancer Colorectal cancer AML lung cancer ovarian and and uterine as well So so there's a whole spectrum of mutations and and then so here Unsurprisingly is as p53. So this is the most mutated gene in in the human cancer genome And it is it is the the cancer tumor suppressor And then you have PI3 carnase mutation over here That's associated with uterine cancers and and for example here you have the APC Gene which is highly mutated in colorectal cancers and so this gives you a really nice overview of of the types of biological processes which are impacted in different cancer types and And I think what one of the really surprising things that came out of this and we wouldn't have known this had we not sequenced Think had the community not sequenced this number of tumors is that biological processes such as histone modification and even pre-mRNA splicing are Implicated in a number of different tumor types and these are global processes of genome integrity and also Obviously in protein generation integrity that you would expect that if you you actually aberrated those processes That that would lead to deleterious phenotype and those cells would just die because they're such core processes But in a number of tumors now we see histone modification as as a major contributor to to the cancer biology and it and one can one starts to think of this as very analogous to DNA repair abnormalities because you've got the ability then to really manipulate the 3d confirmation of how the genome was packed in the cell and That has downstream impacts on transcription. So so this is I think a really important Result in the sense that whole new sets of of biology has been revealed through Mutational analysis in in large sets of tumors Any questions? Okay, so the clinical utility of Cancer genome sequencing is just I think Beginning to bear fruit there are a number of parallel efforts in in both Industry and in academia to reduce this type of thing down to practice whereby we could Develop panels of genes to sequence in the context of clinical care that could then be used to inform whether a particular therapy would be useful in a patient and And so so this idea of companion diagnostics has already been well Described so eJFR mutations illicit Tyrosine carnase inhibitors BRAF v600 e mutations for example in melanoma Have been used to as a companion diagnostic to to administer I Can ever pronounce the same He must for an MF for an M. Somebody some people can it rolls off the tongue not for me Anyways, it's very difficult name to describe but but if these are this is a target inhibitor against against this particular mutation happens in 70 percent of melanomas and and is essentially I Elicits often elicits a response But then there's there's typically a relapse associated Later on because there's probably it probably induces a massive selective pressure to select for resistant clones So so this is you know, this has really had really great promise in the beginning Because it was showing really incredible responses, but those responses typically are not durable so so we want responses to be have to be durable and and and so The problem with these targeted inhibitors generally speaking across the board has been that they select for resistant clones so it's it's just Evolution outwits us in that regard and then we have Again remarkers of resistance therapy, so I already mentioned secondary eGFR mutations and this is very likely what's happening in these BRAF inhibitors as well so there's a nice website we can go to get a list of targeted therapies against no mutations and so recommend checking that out the There's a very exciting development in the world of ovarian cancer are associated with homologous recombination deficiency, so there's a class of new drugs called PARP inhibitors and and they as I said they work in a synthetic lethal capacity in patients tumors whereby homologous recombination is is impaired and This is really transforming the field because there have been no real developments In high-grade serious ovarian cancer, which is the most lethal gynecologic pregnancy that we have The outcomes haven't changed 30 years It's really astonishing Really really remarkable you think about all the progress that's happening that disease is still something that we can't Come to grips with but this new class of drugs and I think AstraZeneca is applied for Approval will will be applying for approval of the FDA in September for their their compound elaborate and so but we need to know Whether homologous recombination is actually defective in those patients before administering the drug because otherwise it won't work and What's been there have been incidences of really remarkable mutations that are acquired so ovarian cancer 20% of ovarian cancer is associated with germline BRCA deficiencies, so they have a It's the same gene that was made famous by Angelina Jolie Did her hereditary test and decided to do some surgical prevention and and and so in fact It's it's much more of a ovarian cancer is much more of a BRCA disease than breast cancer is but but it doesn't get its Day in the Sun needs to So so I'll just plug the need for it to study ovarian cancer But but nonetheless so so what's really interesting about this is that in rare cases? women with germline Frameshifting BRCA mutations the tumors acquire Additional somatic mutations that restore the reading frame so this again speaks the temporal nature of a of a driver So so that may be that that deficient BRCA gene may be very responsible for tumor genesis But then through either through a selective pressure induced by therapy The the tumor restores capacity to repair Those that that process in in a subpopulation of patients So so knowing that in the in the context of administering proper and proper inhibitors is very important. Okay All right, so so what we're talking about here now, let's just get into the actual data in our somatic pointations, and this is this is an actual sequence from a p53 Bearing tumor this is a triple negative breast cancer that has a g2t substitution at this particular locus and that induces a premature stop codon and so And results in loss of function of that of that protein in that cell And so as more we mentioned that p53 is a tumor suppressor gene and it's essentially it's involved in Program cell death and it's involved in DNA repair and a number of other sort of core processes It's probably the most well-studied gene in the human genome from the cancer perspective And it gets implicated in all kinds of biological processes because because of that So So given that we're talking about these point mutations. There are a number of different classes So one can think about a missense mutation. This is a single base substitution That alters the amino acid sequence of a protein So so here you have incorporation of an a to c change and that changes This amino acid to to this amino acid here We can have silent mutations in the coding space that are called synonyms mutations These are single base substitutions that don't change the amino acid sequence of a protein. So if one were to place their bet on Which one would be a driver? mutation versus passenger mutation you'd make the assumption that the missense mutations are probably much more likely to be driver mutations There are now reports though of many Silent mutations that actually have Effect and and this can be through induction of cryptic spice sites or through other type of regulatory impact. So if you have Excellent sequence that actually doubles as a as an enhancer or promoter Changing that sequence can have an impact as well Then we have nonsense mutations which are called truncating mutations And these are a single base substitutions that introduce a premature stop codon and then you can have frame shifting mutations where you have actually Removal or insertion of additional genetic material that usually Ends up resulting in a premature stop codon because you change the reading frame of that of that transcript Okay, so these are the major classes of point mutations That you'll see and So here's an example of a of a missense mutation That we discovered and it's in a rare form of ovarian cancer called granulosa cell tumors the ovary And and this is actually found just by sequencing I wouldn't recommend that this is the way things are done But but this is actually sequencing transcriptomes and we noticed that there is a common substitution at the same locus in in four cases and that induced a Cysteine to tryptophan amino acid change and and What was really quite interesting about this gene is that it is Involved in granulosa cell differentiation and that was only just discovered right around the time where we found this mutation so so we did some literature search and and there's paper that emerged and and it all made sense it fit and And so then we went and looked at a large series of these cases by trolling tumor banks across the country And across the world and found that in almost all cases They harbored a somatic mutation. That's exactly the same mutation So it is the pathodemonic mutation that that defines the disease And and it is the the transformative event in the disease It's very likely that in my career. I'll never see anything like this again It's just was a lucky hit and but but is is the kind of most extreme example of Of a mutation that's driving a phenotype. It's one base change changes the phenotype creates some malignancy It's now used as a diagnostic in multiple different countries and and people are starting to develop therapeutics against it. So Virtually all but but it's somewhere and we've found it in 95% Yeah, I expect the the rest are false negatives or or misdiagnosis yeah so these missense mutations can often occur in in hotspots in a gene so this is a mentioned PI three counties mutations is involved in phosphorytc signaling and And and its character mutations in this gene tend to cluster in two different places So one is at the 1047 locusts one is at the 545 542 locusts and and these These are the the places where where you expect to see the mutation So if you were to just design an assay to look for PI three kind of mutations You can probably just look at those two hotspots and that would get you 95% of PI three kind of mutations that that are active in the cell and so No other well-known examples like this where you have KRAS code on 12 In in pancreatic cancer And and I'm already mentioned the BRAF v 600 mutations in melanoma when these mutations occur in BRAF and they're always at Always at 600. It's always the same one So so these can be really useful because they can be used as diagnostics and and also indicate as I said Whether a patient should be put on a particular therapy So so these are the what we call activating or oncogenic mutations So All right, I don't think I need to go over this So so here's where PI three kind of sits in in the signaling pathway. So so here you have a PT and that drives all the cell cycle progression downstream growth signaling patterns and so You disrupt the pathway at this place and it has a cascading effect through AKT down all of these different processes Okay, so in contrast to that this is the typical pattern that you might see in a tumor suppressor gene so this is the discovery we made in in Clear cell cancers or endometriosis associated ovarian cancers and So what you see here is The mutation this is the protein that's shown here And this is the basically the amino acid that's that's hit by all these different mutations and and generally speaking these are These are stop codon mutations or frameshifting mutations So when you see a pattern like this across the population of tumors It's almost certainly a strong indicator that that's a tumor suppressor gene because you have multiple ways of inactivating it And and often you see this and so you'll see this pattern characteristic pattern in pp 53 you see it in in RB You'll see it in p10. Yeah Well, so that's that's the point is that this is the recurrent by gene here not by not by locus So in some cases, they'll be they'll be locus Yeah, because there are some places there only some places where you can actually get a truncated mutation Right, so so the amino acid sequence has to be such that Or that the mRNA sequence has to be such that you can actually induce a stop codon That's limited to a Restricted set of positions, but so then my question is that if you run this through if you're looking at a number of patients looking for arid one a Mutation and running a Software that would look at significantly mutated genes will this show up Oh, yeah sure because that that's that software what it does is essentially will look at in in some region of the genome it could be a gene or Was a likelihood that we would see this many mutations in that region. It's not the base pair specific It's the region specific. They don't have to be record At the same point. No, they're recurrent by gene though. They have to be recurrent by gene Okay Okay, and so then often we see the other thing about this is this is where now integration of copy number changes comes in so we often see these These types of mutations will be co-occurring with a loss of heterosugosity or if patient doesn't have a Mutation then they may have a homozygous deletion for example of that gene So so that's again where different types of biological features can be brought in And we see similar patterns for p53 Braco one bracket to Yeah, so sure I should mention that each one of these dots is from a different tumor, right? So so this is just the population level synthesis of this data So each each dot here is from from a different tumor And so it's basically suggest that what one can inactivate this gene in many different ways And as long as that gene is inactivated the phenotype is gets is what gets like So this is a gene involved in the sweet sniff complex. It's another one. It's histone modification modification processes and so In these endometriosis associated oberian cancers some something about disrupting chromatin packaging is is advantageous those cancer cells Still trying to work that out this government is made four years ago There's a lot of parallel efforts going on to try to work out the mechanism This is different that's very different so p53 is well known so the apoptotic This is chromatin modifications The missense though missense mutations in p53 they're all in the DNA binding Never see missense mutation those So then here's just an example of some more hotspots so This is from that focus team review and it just shows just a different way So here's an RV one tumor suppressor gene. You have mutations spread throughout this VHL another tumor suppressor gene again mutations spread throughout But then in for example IDH-1 mutations and glee blastoma They all are the same locus and the PI3 kinase have have hotspots Okay So again this and this is from that review So you probably got a lot of this information that I'm done. I'm telling you now just from reading this review The same concepts are iterated there. So I really recommend that you do read that Okay, so then actionable mutations. So there are a number and and I've listed and talked about some of them already So so then let's look at beyond gene so There's an emerging concept because of the ability now to sequence whole genomes whereby one can actually look at Basically summer summary measurements from looking at mutations across a whole genome I can tell you something about the phenotype acting in that cell and that can tell us about for example mutation rate So how many mutations are in a genome? It's probably an indication of maybe again what DNA repair proteins are Are are aberrated in that in that tumor type and and then this concept of mutational signatures Which is the substitution? Patterns can tell us something about the mutational mechanism. That's operating So let's first look at that. So the substitution patterns so So this is again a synthesis of several different tumor types This paper came out last year and and suggests that for example Here you have lung cancers with a preponderance of of C to A mutations and that's likely due to the insult of tobacco smoke In the lungs and so that is a specific mutagen that changes a Cytosine to to an amine and so and then on the other side of the picture we have Or melanomas here which are C2T enriched so the vast majority of mutations and melanomas are C2T And that's a cytosine Diamondation due to UV exposure so so it all makes sense So you have these environmental insults that create this specific mutation pattern That's associated with those things and then and then you can have Different endogenous causes for this as well But what this is really started to do now is is we can start to classify mutations both by their mutation abundance Which is which is shown kind of in the radial axes going up here and and then around this this donut Are the the actual specific? Substitution patterns and and so this is really quite see so one can start to class mutations as a as a as a C2T Mutations and it gives us some indication of a mutational mechanism. That's operating And so and the number of mutations is is actually quite interesting as well, and so we can see a real Variance from tumor type to tumor type. So here's AML And this is showing the number of mutations per megabase and it's way down here. It's it's it's has you know between 0.01 and 1 mutations per megabase but then you go up to to the lung cancers and especially the smokers that that acquired lung cancers have mutations that are more in the More than the the sort of 30 to 30 to 40 mutations per megabase So that's that's several log orders more mutations going on in lung cancers in the AMLs all right so this has important implications for them potentially stratifying tumors, so this was really nicely demonstrated in the in the endometrial cancer Marker paper from the TCGA and what they found is essentially that you have several classes of Tumors and and this is really a disease that was considered. These are uterine cancers And they're sort of considered the same thing so we take its population and we treat okay I'm uterine cancer and treat it in the same way and they're clearly can be classed into four major subtypes and and those Subtypes co-segregate with the the mutation pattern. So here you have a Class of tumors that have a pole e mutations and that leads to massive numbers of mutations So so this is the ultra ultra hyper mutated cases and again This is on a log scale and you can see that the number of mutations for megabases is is upwards of 500 In in these cases, which is way far exceeds The other the other tumors and then we have this microsite is microsatellite in stability type of mutations and these have actually different mutational patterns, so the different substitution patterns compared to the ultra mutated and And and then can be classed as a different different tumor type and then you have What they call in copy number low and copy number high So these ones have few mutations that they have lots of copy number changes Okay, so this this speaks to this idea that typically there's one mutate. There's one DNA repair abnormality operating in a particular tumor And and what's what's really important about this is that these Different classes they actually track with outcomes. So these pull II ultra mutated cancers None of them die nobody dies So they're malignant tumors they can be cured of surgery and Those patients are fine So that is very very different than these copy number high cases whereby after Five-year survival after five years only 50% of patients are alive So and this is being borne out now. We've actually Some colleagues of mine have gone back in and trying to validate this and it's true If you look at the outcomes of the cases with polly mutations They're all have either died of other causes or they're still alive So it really is so it speaks to the ability of looking at the whole genome as as a way of stratifying patients and with different phenotypes Okay all right, so I Know I that's actually I think that's actually was a misconception I think it's it's coming out now that the hyper mutated cases are actually Somehow has a protective effect or maybe because of immune surveillance. It triggers, you know a bigger immune response And so I think it's being borne out several different diseases now or Somatic high mutation may have actually a protective effect Okay, so this might be a good time to take some questions Now really I mean I think it is disease dependent and This paper the CCJ paper just came out a year and a half ago So I think a lot of people now are working on trying to actually generate clinical diagnostics that could actually You know subtype these these these patients, but it's still in development. It takes takes some time And and really I mean we've only had data sets in research context To really explore this For the last three four years, so I think it's going to take some time for that to mature But eventually I think yeah, I should we should get there I don't recall the basis Yes, I don't either Typically these initial marker papers have used a lot of They try to use statistics were possible, but sometimes just ad hoc they make Just cut off and say okay well this group is above above my line And so I'm going to classify this and this group is below the line. So classified as something different but But one could actually you can imagine you can easily compute this and but the problem is it's going to be a Continuous spectrum and so what to discretize that into categories is is probably should be done with Being able to ascertain whether there actually is a mismatch repair protein Abnormality that's yielding those or not. So if you if you have some cause for it Then then it makes sense So you have the nut you have the high mutation rate But then if you can have also associated with a cause in that team or then that's a much better and more reliable way of doing Okay, all right. So then now let's look at how How these mutations can be detected in in primary data. So given the sort of biological overview and now we'll go into some of these Aspects so I've already mentioned this in the morning, but it's worth re-emphasizing. So tumor normal admixture Intro to moral heterogeneity genomic instability And and also, you know the experimental design to capture mutations is quite important. So this this really Necessitates the the need the necessities the development of Cancer-specific tools to analyze mutations. So there's a lot of machinery out there to analyze normal genomes and they said and You know the SAM tools package is probably one of the more cited papers in the last in the last few years because it's so ubiquitous in terms of people using it for for for exploration of of normal genomes and In the initial days a lot of people were using that package to study tumors because it just wasn't anything else out there So I'm going to sort of go through a history recent history of how development of somatic mutation detection Over time has evolved into now where we have fairly reliable and pretty good tools to do this And and you'll be looking at some of those tools in the lab So of course the first process is when we get fast cue data off the sequencer It looks something like this is and and so we need to align it to a reference sequence and we get something that looks like this so So what's shown here is just in black or the the the nucleotides that match the reference and in red Are the positions that that are? That don't match the reference and and to get to this stage You've already probably covered that there's there's a huge literature in alignment software and and a couple of Pieces of software have risen to the fore Mostly because they they work they don't crash I think more than and either algorithmic sophistication They work and and people can use them And and and actually run fast By no means there's the most popular alignment algorithm the very best algorithm I would say that the very opposite is true. So There are algorithms such as novo line which try to do a principled alignment of all the data Probably are yielding much more accurate results, but but nonetheless the field has adopted tools like BWA as as a de facto Community standard if you will So so we we entered into this this domain of trying to infer mutations from tumor normal Genomes or exomes in around the year 2010 and and developed two methods that were published this this method here joint S&V mix This was actually the first Somatic commutation specific variant caller in the literature And or one of the first and and then we subsequently developed this mutation seek Which has different properties and so I'll just walk through how we How we came about with the assumptions that we Incorporated into these data sets. So so let's begin with this first of all this notion that we have two Data sets when we when we when we're doing somatic mutation detection so that right off the bat changes the game completely as Because one could analyze can imagine you can analyze the tumor Exomer genome and the normal exomer genome and then do some sort of subtractive analysis you can say well Okay, I'm gonna call something somatic if I see it in my mutation list derived from tumor If it's not in my mutation list derived from normal well that turns out to be not work very well and that's because When when we sample these alleles it's always incomplete. So so it's a When we're building a library for example from the normal Imagine you have a heterozygous polymorphism here, and that's what's shown in blue So you have half of your reads that can be projected on to this So so the first thing actually that we do is we project the nucleotides onto a numeric vector space so where we we call and Is the number of reads that support the reference and D is a depth and so that's a total number of reads at a given position and so then we try to We can then use these numbers because they allow us to use really nice standard Statistical frameworks like binomial distributions. So let's think about it as coin flipping Okay, so you're gonna flip a coin many times and it can be heads or tails and and the bias of that coin So let's see how very biased coin in terms of heads Will reveal a pattern that that whereby mostly observations will be heads And if you see 5050 then you can imagine that that's going to be a really unbiased coin That might represent heterozygous position And and so you can start to model this by by standard statistical distributions but the most important thing here is that in fact these two data sets are highly correlated and most of the variant positions are going to be germline polymorphisms that are actually shared in both the tumor and the normal and And so so the somatic mutation rate is far less than than the germline polymorphism rate So it may be several log orders less in fact So you may see only a somatic mutation rate. That's one one one-thousandths of the germline Variation rate and so one can take advantage of that and start to try to borrow statistical strength from these highly correlated data sets And so so that's what we did in this In this algorithm and we took these allele count measurements and and then we're able to classify them into Essentially nine different groups and that's by taking the cross product of the genotype state space in the tumor and the normal and and then try to infer Given the output of the counts. What is the most likely state to have given rise to that data? So here you can see the red one you have all the reads are Reference in the normal But then you have half the reads are variant in the tumor and this is the type of signal that we're trying to find here And so that court that corresponds to AAV In that table there and that that's gets the the highest probability if you will so so this was rolled into a statistical model called joint SAV mix and and and you can read about the gory details But this is a generative probabilistic model that tries to simultaneously analyze these two data sets in order to To determine this genotype state So we don't have time to really dig into this but just to say that Treating the data Simultaneously gives us tremendous advantage then treating them independently, which is what most people are doing at the time And that's borne out in accuracy metrics. Yeah This is excellent data. Yeah. Yeah but then actually we did some Evaluation on on actually these are these are actually genome scales So this is 11 diffuse large B cell lymphomas that we did the accuracy metrics on and in both the germline case and this and and the somatic case We are able to To do a better job than than if we were to look at the data sets independently They get all refer you to the paper for for that so So then we got faced with an issue because we thought we had really cracked cracked the nut on on this problem but then did an analysis on on our this triple negative breast cancer data set that we had generated and and we called three thousand mutations using this method and The majority of them actually didn't validate as being real. So so that that that was a bit of a problem So we did better than we did before but this is still pretty bad so so we thought thought well, okay, what can we learn from this and and and the and so you can see here that A legal account so I you know showed this picture of projecting onto a legal accounts. It's completely Insufficient given the noise in the in this in the system. So so here you have For example a case where you have reads at Harbor variants in this top This is the tumor sample up here This is a normal sample down here and you can see that there there is this nice signal of mutations in the tumor Not in the normal and so this was caused this is called as a mutation But we valid want to validate it and it turned out not to be the case and and you can take these reads actually and They align almost equally well to other places in the genome. So this is actually a misalignment Do induced by this this variant here? And so these reads actually don't belong at this locus. They belong somewhere else. So that's one way from which We can get false positives. Yeah Re-sequencing yeah, yeah Here is a set of false positives that are induced by indels. So so here is probably a germline indel That's here and and you can see So you've had you had some exposure to IGV already, right? So okay, all right good So this is what an indel looks like and you can see that in these three reads Basically the indel is not getting placed properly and it's causing the tail of these reads to be to be put in the wrong place And and that if you just again, you're just looking at this particular position It looks pretty good. It looks like there's there's these three reads here with the mutation these These reads here and the normal don't have any evidence of the mutation. It looks fairly good But if you look at the context you can see there's all kinds of noise around it And it's very likely just a Misincorporation of an indel in those reads that's giving rise to that signal. Yeah This is this is just Alignment of BWA. So this isn't local realignment or anything like that. Yeah Okay, so here's an example whereby You have very it's probably hard to see but maybe it came out in the printed page is you have very low quality Substitutions here. So there are several reads with a mutation here in the tumor not in the normal and But but they're very low base quality. So so in fact, this is the machine the Illumina base collar It's not very confident in those In those bases there. So it's probably a sequencing error. That's given rise to those And then then we have the example of something called strand bias and and this is due to All the reads that are contained the variant are sequenced in the same direction So do not see bi-directional evidence of the variant. It's a very likely an artifact due to PCR allele specific PCR or something like that Okay, and then you have examples like this. So this one. Sorry. I've inverted the tumor. This is the tumor That's normal. So this one there's really nothing to explain it. There are no indels. You have a nice balanced strand bias You don't have low base quality. They're high base quality. You don't have evidence of misalignment but this is a mutation that was predicted but not validated and this could speak to maybe the validation technology is not perfect either and but but there's something about about those That are maybe not explained. So so this really prompted some some head scratching and we thought well What can we do to learn from this experience? We've got this nice label data set Can we use? classification-based tools to learn something about the features that give rise to to false positive versus true positive mutations So and and in fact we had true positive examples that had very very low Signal as well, but but these turn out to validate and so even this one there you have maybe just two reads that show Show the evidence of the mutation and but this one turned out to be to be real So so we embarked on this project called mutation seat and whereby we extracted and computed features of the data and So in the initial paper We calculated 40 features from the tumor 40 features from the normal and then created 26 joint features Which were mostly ratios of tumor to normal and sums across tumor and normals and these included things like base quality Mapping quality the presence of homopolymer runs strand bias as I've already said, etc And and this was similar in concept to the GTK's VQ SR But but we here we use tumor and normal simultaneously and so my computing is hundred and six features if you just project those on to Using principal component analysis on the three major principal components You can and then color the data points according to whether there's somatic Germline or wild type you can see that there really is actually a nice separation that can be achieved of The majority the somatics can be separated from the other two classes So so then this gave us some confidence that we should be able to then determine a classifier using these features that can That we should be able to learn a classifier that can distinguish these that goes well beyond just a lila count modeling Which was done in joint SMV mix so so embarked on this and and then through cross-validation analysis We're able to demonstrate that this this type of approach far exceeds the other The standard of the time which was to use either SAM tools or GTK And I will get very nice This is a receiver operating characteristic curve that showing sensitivity plotted against against specificity and And we're achieving very very good results so much much better than that 30% Validation rate in the first in the first round so so this this allowed us Some confidence to move forward and then what we did is we then went back and tried to explain So what is giving rise to these false positives and and we clustered the data according to these features? And we found that there were several characteristics of of of different classes of false positives So in this group one we found that they were induced by strand bias unequal mapping qualities in the tumor normal and and low confidence or low low confidence genotypes Then we had another group that was associated with And this is this is a common thing that happens is is in the context of GGT Trinucleotides often get the misincorporation of a G base instead of a T and so this group here was was characterized by that and then we had another group that was Due to misalignments and repetitive sequence another group that had low base quality, but also had the GGT to GGG error and then This was a group that that actually so if you look at this profiles heat map here The somatic mutations down here and then this group actually has a profiles very similar to the somatic mutations across these features Except there are very very weak signals of the variant in the normal. So so it suggests that these are probably Missed called germ lines most of the time and actually should be called They should be called germ line and not somatic. So we have these different classes for explaining false positives So you can see the point of why I'm showing you this is that the mutation calling is a complex Process actually so so we think it's it's quite simple We just take count the number of alleles in the tumor in the normal and we say, okay That's a somatic mutation the machine artifacts actually vastly outweigh the biological noise of germline In the data and so so number of groups have you know This is kind of a parallel journey of discovery if you will that was happening probably up ten different centers around the world And and and this is our effort here that was used in that context Okay, so how are we doing for time? Break at three. Oh got plenty of time. Okay. Yes So you first call all the mutations that you can and then No good good question. So this is a classic you need a training set. It's a training set learn the classifier and that apply to a test set But you already have the training. Yeah, so we release the training that the result of the training as part of the Software package so as part of software package That model is derived from essentially learning the classifier So actually since that's 3000 mutations of since actually added about 10,000 more mutations So it's now trained on 13,000 or so Yeah, it gets a little bit better every time sometimes a little bit worse But but still your your I don't know But you call all the mutation all the mutations and then you Classify the mutations here. Oh, I see. Yeah, so so the initial training set was was derived by just applying joint SMV Basically using naive ways to Call mutations and then we went actually experiments So that's still joyous even still unlabeled data that we went into the experiment to get reliable labels Once we have labels in it that makes it amenable to learning the classifier So you can but you can use any kind of discriminative classifier progressive random forest Basing out of the regression trees, whatever whatever have you on naive days The point is I've used all these features it improves things It's the classifier method is less important Also In the program. Oh, so what so when you actually run the program it just runs from the tumor normal band file So so when you see this in the lab, so when you actually go to run the program Literally just the inputs are the tumor fan file the normal band file Output is a DCF file that contains the somatic mutations and they have a probability associated with them So you can use that you can use that call that as a quality score to filter off low probability Yeah, the tumor normal Yeah, so so I mean obviously that if you don't sample a Mutation due to depth you're gonna miss it, right? I mean it's just certainly depth plays a role but but we're able to find mutations at 30x that are in the range of 10% allele fraction so three reads Yes Sure, sure, I think you need to That would that would be a case where it would be useful to train on that high depth data Which we actually have a lot of because we've done reseek targeted resequencing and one of the future rollouts is to have a deep A deep mode where you do have this high depth and it's really for panels It's even 500x or even some people are doing x-omes at 500x Yeah, no, that's a good point absolutely, so so the vast majority of this is is Trained on x-over genome at about 50x coverage and and we really tried to The our motivation for my own lab is I was generating a lot of genome data and and wanted a tool I could work on on genome data and for which I understood Every step of the analysis process a lot of times you download third-party software and and it just doesn't match up with with the paper sets and So it becomes impossible to really interpret exactly what the outputs are so So from my perspective, I need to deconstruct everything right down to the mathematical Assumptions and make sure that's implemented exactly as I expect That's not going to be the case for everyone in this room, but maybe the case for some of you in the room But but that's that was what was important to me It's that so I've invested a lot of time and developer time into into creating a package that we can use and interpret in a meaningful way so The various collars were ignored things in different ways when they're making yep Yeah, and so you're a random forest for example will out of the 40 or whatever features you It'll tell you which ones have the most certain weight correct discriminating. That's right Are there particular ones that stood out in the top? Yeah, so so again things like Stram bias We have a feature called cross entropy, which is essentially the amount of entropy that exists across Across all the reads and that that turns out to be quite important and then of course that the likelihood models from these binomial type of tests And these binomial distributions tends to be quite important that but that those take into account the depth essentially right so so the more Observations the more peak the likelihood is in those statistical distributions So and that it actually breaks down at some point because you have a binomial modeling Of course you have that parameter that says this is the expectation of seeing a particular In the event which is the variant And then once you get lots of observations small deviations away from that expectation actually result in very low likelihoods So that's that has that's undesirable So so one can actually used over dispersed type of distributions to accommodate for that And which had to do that on occasion All correlated is your 26 Oh, that's a good question. Yeah, so so some of them are some of them are highly correlated. Yeah, they're not they're not all independent So so you know that for example mapping quality is actually related to base quality because Incorporate space quality So so there are some some that are highly correlated and now did you assess the importance? so surrender for us essentially Outputs feature importance based on the stochastic resampling of the trees and and asks you know in the in that Resampling the tree is how often a feature is at the top of the tree for discriminative capacity So it's essentially like creating decision trees in different ways multiple times Distribution Can ask how often we get a meaningful split on the future and so then It's that's what you in fact we could end the two the ones in the tail of the distribution. We've actually now just They consistently Are not important But I think it's the running course is a little biased towards You're absolutely right so that there are ways to then today post-process that or even run it in a so ultimately you can imagine that You could run you could actually do a basic structure learning. So you have a what I learned actually a complete But but what we've learned from this is that The vast majority of the game is just in computing features And then the method of classification becomes less important Yeah This is in cross-validation and then we actually in the paper you see that there's a test set that's held out It's just a held-out set That that we never actually trained on or touched from that perspective Yeah, and actually so this is primarily trained on work on exos and that data was on genomes even on a different platforms and solid platform genomes and despite that we still got not quite this good Well, we're not we're not in first place So probably not but I think the dream challenge is too synthetic Yeah, there is a real stage at the end We're in there. We're in the top five or something But yeah, a lot of a lot of groups have put out nice software packages that this is really for educational purposes meant to be About the concept. So one needs to take into all these features into account And and there are different software packages other which which will point to you later on that That are much more popular than our own tool. We use it internally Because of the reasons I stated I like to understand from the very raw data all the way through to output How that was calculated So I can have my own confidence in that but there are software packages that do a good job with us Oh, yeah, that's a that's a really excellent point so so we're working We I have a tight collaboration with Paul butros So we made anybody from Paul's lab here And and so we're trying to actually Sort of exactly that question. So so one of the best practices in terms of for taking the whole genome and and finding mutations and doing rigorous benchmarking, which is So dream challenge is one way and then the TCGA and ICGC have their own internal benchmarking efforts And and so the intent is to actually is to do exactly that Just keep feeding in new new features and then running these benchmark data sets and hopefully the The amount of validation and ground truth data will grow over time such that such that you know essentially It they'll become there will be some parity across all of the mutation colors, and then we can just put it to that Okay, good so how are we doing I think we're getting there See yeah, yeah, okay, so I can go through the last part pretty quickly Let's just talk about some available tools. So I already mentioned Sam tools and and and like I said, this is has some very very nice Nice community standards associated with it You know, I think I think Hanley who developed this should get a nice award sometime soon for a massive Contribution to the field and he's just a really good contributor. I just made all of his code and Completely open to the public and so I think we're all a great benefit benefit from this. That's good Here's the ghtk toolkit from the Broad and and this has some nice features in terms of Fixing alignments if you will so through local realignment and and and things like that and Then one of the more popular tools for somatic mutation detection is mutek And this is this comes out of the Broad Institute as well And and you can see here that there are a number of concepts that that overlap We have things like in in insertions and deletions proximal gaps as they call them strand bias mapability things like a Trial elix site and then what the distribution of the tumor to normal allele counts so so all of these concepts We've published in various capacities and and this has all been now Ruled into a very nice piece of software. That's that's been used to drive a lot of the TCGA Datasets that are out there are probably a resultant of of mutek And they did some nice benchmarking To show the sensitivity the algorithm According to sequencing depth and also according to expected variant frequency And that's just so so if you want to learn more about this you can read their paper another effort That's come online is stroka. I think this is published just soon after a mutation seek and this actually comes right out of Illumina and This is where you can get the software and and this is the the sort of workflow that That that is a nice piece of software produces VCF files from BAM And and I think you'll you'll use it in the lab Coming up So one of the things that for mutations see that we've been really trying to capture is this phenomenon of Ken the genome as a whole be used as as as something That we can actually classify And so we've been working on visualization techniques for to accompany our software and And what you can see here is for example allele ratio as a function of coverage and and this really shows a density of data points So this is over 4000 data points in this particular tumor and you can see that there are these different clusters and and one can make the Assumption that these actually represent different clonal populations that are at different abundances in the cell And so so that gives us a sort of a global picture of that and then we also calculate this substitution profile So these are tri nucleotide Context and and this gives us a picture of maybe what mutational mechanisms may be operating in that cell and then finally We look at the density of mutations across a genome. You can see it's highly known uniform And this has been reported before that there are parts of the genome that are more mutable than others And there are lots of literature on that So here's a here's a an example of how we can use these Profiles as mutation portraits if you will to absolute quality control So this is a Lymphoma sample that we sequence and you can see that there's a huge enrichment for C to a mutations and and this is This is not expected in this tumor type at all and in fact, they're all associated with very very low allele ratio variants and And so there's a paper again Put out by the Brode that suggests that there's oxidation of DNA that happened during over sonication of libraries And that induces this substitution the C to a substitution. So so we think that this is actually a complete artifact 90% of these mutations are are due to this artifact So so this is a way of of qc. Hopefully you catch this before your sequence, but in this case we already Did all the work and and found this sorry so that software They have they did release a piece of software that can that can actually Remove those but but what this is just doing is visualizing the data as it is So so you can take this whole genome profile Mutation seek software and then visualize and you'll do this in the lab as well You take a VCF file and output this thing and when you look at this this should raise alarm bells Right away that that this is probably a bad library Yeah, oh It's it's an enrichment of C to a Mutations over here. Yeah, and it and it corresponds to very very low allele ratio So here it's somewhere around 10% or less and that's that's the signature You We haven't really done a whole lot of ffp genomes Proud I would assume that that would be the case but but We typically work with frozen material Yeah So so we'll go over VCF format in the lab There's a there's a nice web page here that that talks about The GAT case VCF format VCF format variant calling format is rather loose Format I wouldn't I would hesitate to even call it a format But it does have some common characteristics in the sense that it has a chromosome a position You can have an ID for that particular position that could be a snip We have the reference base the alternate base and then a series of different Characteristics, so in mutation seek these characteristics are for example the probability of somatic mutation the count of the allele count of the tumor The reference alleles alternate leels the counts of the normal reference the alternate allele in the normal the trinucleic Tied context and then we actually output the number of Proximal insertions and deletions because that that tends to be a real Really wreak havoc on on the On the actual thing and then we have a threshold on the probability of positive call and this is this is end up This is what ends up looking like So you'll explore this in the lab as well And so here you can see the trinucleic nucleotide contacts. You can see the different counts And then here's the probability So in the same read Yeah, okay Okay, and so here's just a list of of software that's available out there for somatic mutation detection These are all packages that that work they run they have various quality. I would say that Probably stroke. Sorry. I haven't put stroke here, but it's in the other slide stroke Mutect and mutation seeker are are performing somewhat similarly and and all much better than for example somatic sniper enjoy SMD mix and and in the dream challenge actually that Mutect Mutect is blowing everybody away But I think it's again because that's really synthetic data Okay, so we have visualization tools IGV and then we have annotation tools Yeah, if you don't have a control Don't sequence Yeah, I wouldn't sequence. It's I mean the somatic mutation signal is buried in in One one thousandths of the variance, right? So so you may get lucky But otherwise, I just wouldn't do it wouldn't do it Study a different tumor Okay Not necessarily I mean if you have contact with the patient the patient's still alive the patient can sense then great But that's easier said than done. You can't just say I can always draw blood. I would say you can rarely draw blood We you should Oh Yeah, you don't need match normal tissue you see my normal DNA. Yeah, yeah, but I think I think you're talking about if you Don't have any source of match normal DNA Yeah, yeah Yeah, okay, so very quickly going through the advance and future topic. So What's our time? I think we're out of time probably but okay gonna. Can you give me five minutes? Five more minutes to wrap up. All right So one of the very very important concepts of next-gen sequencing is not just its ability to broadly sequence across the genome in in high throughput, but rather the digital capacity of a measuring alleles so so what we have here is we have a pool of DNA and And and again this is this comes from a mixture of populations in a particular sample So this mixture may only harbor for example, let's say 20% of the of the cells may harbor a particular mutation When we get the lead out We the leads will harbor the mutations in relative proportion to The source material in the first place and so this is this is often underappreciated But if we do very deep sequencing you can get very precise Estimates of the allele counts of a particular mutation and we can use that To to make inferences about how these populations may be shifting over time And and we've we've shown some of that work in a couple of different studies So so this this is now so I like to break this down into into three Experimental designs where this can really be leveraged. The first is is deep analysis of a single sample Where one tries to reconstruct a population structure from measuring these alleles in multiple different loci You can imagine that so this is this multi-region sampling idea that I showed Charlie Swanton's paper We published a paper last year in ovarian cancer that that use a similar strategy and then and then the last is in time series So if you have a primary tumor and then then you're lucky enough for the patient is unfortunately Unlucky enough to relapse and we can study that relapse biopsy Then one can actually start to make inferences about how these total populations have changed In the context of therapy and this can also be done in model systems such as enografts Where we're where patient tumor materials and grafted into mice and then we can follow how those populations change In mice in the context of drug selection experiments So so this is this is a sort of breakdown of these different experimental designs But when we measure a particular mutation very deeply at the locus So so here they have this aid And this can be done through PCR followed by deep sequencing This arises from a from a multiple set of factors and the first includes the amount of normal cell contamination And then we have this notion that the mutation may only be present in a subpopulation of cells Okay, and that's an unknown quantity and so that will give rise to that that observation as well And then we have this concept of the mutational genotype. So this is really where copy number Really can come into play to help interpret Genomes and sort of mutation. So here you have an instance of a diploid Heta zygous mutation Here is a homo zygous mutation And so the difference in allele fraction should be between 50% and 100% But then you may have a case where you have an allele specific amplification And and that could yield an allele fraction of 25% So you have four copies and one gets mutated then then that's going to yield an allele fraction 25% So how do we take all this and and deconvoluted into actually trying to estimate the So the goal here would be to estimate the proportion of cells harboring a particular mutation from that allele Fraction and and so we've developed a method called pi clone, which is a Bayesian probabilistic model Again for the Statistically inclined you can read about it. It's published now in Nature methods. I'm not going to go into too much detail Except to say that we we incorporate these three major concepts of cellular of the cellular contamination of normal cells copy number at the locus and then try to estimate from that The proportion of cells that are likely to harbor a particular mutation So so that has implications for how we interpret these mutations in terms of the evolutionary history So that's to go back to the introduction of the lecture now So we're trying to put this in the context of evolution here. You have the output of the algorithm which shows the inferred Cellular prevalence of a mutation on the x-axis And then these are the these are each of the mutations that we try to cluster together And you can see that they cluster into these different waves if you will of clonal expansion whereby we have those three mutations on the left are very likely to be in the initiating clone because they're in the majority of cells So these would be the trunk all mutations at the root of the tree if you will and then you have the subsequent acquisition of mutations over time And so this algorithm allows us to infer this And we can start to relate this into the presumed phylogenetic progression over time And so we've done some extensive benchmarking of the algorithm where we've had synthetic mixtures And try to read recapitulate what the mixture should be and here's the ground truth That show the expected prevalence of particular mutations And and then this is our our estimate of that over time So so so then the output of this can be viewed as follows So here's just an example of a time series experiment where we have a tumor Allelic prevalence This is just the illegal illegal fraction on the x-axis and the illegal fraction of the xenograft from that same tumor over time And so you'd expect that these mutations if they were if they were Constant or preserved in the xenograft. They'd all sort of lie on the diagonal like this Where they'd be highly correlated, but you see that there are mutations in the axes and and that suggests here You that really does suggest that there's a clone in the tumor that does not expand in the xenograft And you have a clone in the xenograft here. That's expanded from a very minor clone in the tumor And so so then we can run this through our algorithm and we get a picture that looks something like this and And so this suggests that all these gray mutations are Are the ancestral mutations that are common to both the tumor and the xenograft and then you have these events on the axes that really Suggest either clones that were never engrafted or clones that Rare clones in the tumor that expanded over time so you can imagine doing this over multiple series and And and try and enable to track mutations as they evolve on different populations Okay, so last topic is is really an exciting new field I think for this is using similar ideas where we can deep sequence of particular mutation And and this is to to develop liquid biopsies using plasma whereby we know that a tumor might have a particular mutation and then there's some treatment that's administered and then we can try to monitor the patient through through draws of blood and then and then Extract the DNA self-free DNA from the plasma and the idea is that tumors will shed their DNA when those cells they pop toast into the circulation and for particular point mutations like K-RAS for example Because you know exactly what that mutation is going to be you can look for that mutation in the presence of the self-free DNA in the plasma over time and use this as a non-invasive monitoring tool for surrogate for tumor burden and and what this this paper showed is that that the the self-free DNA showed evidence of the tumor relapse And approximately nine months in advance of imaging So it's a highly sensitive way of inserting context of monitoring relapse in response So so this is just a way that a mutation can be used as a biomarker in a liquid biopsy to measure relapse and and it uses this concept of deep sequencing of a particular mutations This is the reference here. You should go look at it Here here you have an example of in a different paper whereby This is a breast cancer that harbored a p53 mutation of the PI-3 kinase mutation. You can see that on administration of pactotaxyl this clone that had the PI-3 kinase mutation This is just using the self-free DNA Is basically extinguished on time it rebounds at some point But but the pactotaxyl had an effect on the clone harboring the PI-3 kinase mutation essentially you could this could be extracted from Looking at the deep sequencing of these alleles in self-free DNA extracted from plasma. So I think this is a really promising new area whereby one can envision developing liquid non-invasive biopsies to monitor tumor progression over time So with that I think I'll conclude and And just to say that you know if we go back to Peter Noel This is really quite amazing So he says that the acquired genetic instability and associated selection process most readily recognized cytogenetically Results in advanced human malignancies being highly individual both caretipically and biologically Hence each patient's cancer may require individual specific therapy And even this may be thwarted by emergence of genetically resistant sublime and more research should be directed towards understanding controlling the evolutionary process and tumors Before it reaches the late stage usually seen in clinical cancer. There's a couple of really important concepts here versus that individualized treatment Well, that's probably best done by mutation profiling And then this idea of resistance Well that that invokes evolution and so so we need to be able to model evolutionary processes in these cancers As as they're on treatment and maybe CT DNA CT DNA analysis is going to be the way to do that They made the remarkable thing about this statement here is that none of this technology measurement technology has been available to Scientists in this era and yet here we are and and so I think we can actually make this a reality as we go forward in the next few years May not work. There'll be some there'll be lots of problems along the way But I think in some patients will be able to help for sure So that I think I'm done for the day and I just like to thank I thank you all for listening and a lot of the ideas that I've presented to you are a result of working with a lot of very talented graduate students such as Andrew And where we've had lots of great discussions about how we move the field forward in terms of cancer genome informatics And a lot of the funding agencies that are provided to support for my work over the time. So Conclude there. Thank you very much