 All right. Thank you. Hi. I'm Chris Miller from WashU. I'm going to be talking to you today about tumor heterogeneity, clonal evolution, and how we're using sequencing to get some insight into both of these phenomena. So my first statement here will be that tumors are heterogeneous. This was suspected as far back as the 70s, but it's really taken the advent of high-throughput sequencing before we were able to dive deep into these tumors and see that they are, in fact, genetically diverse populations of cells. And because of that, within these, evolution is occurring at the cellular level. And then last year, we were able to view this in action in a case of relapsed AML where we sequenced both an AML tumor, match normal, and a relapse. And using that, we were able to put together a model of exactly how this clonal evolution works, at least in this case. It starts off with the hemopoietic stem cell, which gains initiating mutations here. As the tumor expands, some of these cells acquire additional mutations, represented here in purple, yellow, and orange. And these mutations may expand, and so when we assay the tumor at diagnosis, what we're getting is really a cross-section of this clonal architecture of the tumor, where some of the cells look a lot like the founding clone. Some of them are this subclone that occurs in about 50 percent of the cells, and others are smaller fractions of the tumor. As chemotherapy then is induced in treatment goes on, it creates a population bottleneck, where this population of cells is reduced and only a few pass through. And then this expands back into a relapse, then it acquires additional mutations after the treatment ceases. And so what's really interesting in this particular case is that the clonal fraction that actually went on to form the relapse only appeared in about 5 percent of the cells in the original tumor, which is a little frightening, to be perfectly honest, and makes us wonder whether we could have missed it if we hadn't looked more carefully. And so detecting these minor subclones is, we think, crucially important to understanding how these tumors are responding to therapy, and to make sure that we get the whole tumor and not just the major subclone. I think there are several challenges that remain in detecting these. Excuse me. First of all, the genomes are sequences of low coverage. I mean, 30X whole genome sequencing is clearly not enough to detect events that are present only at 1 or 2 percent of the tumor. And even 100 or 150X exome sequencing may not be deep enough. But that at least seems like a tractable problem. The sequencing costs are dropping rapidly. Perhaps a more pervasive problem that we're interested in is that algorithms aren't designed to detect these low frequency events by and large. If you look at this power simulation from our somatic sniper algorithm, which is one of the kind of first generation of variant callers, you'll see that even with 90X coverage, our power to detect events at 20 percent of variant allele frequency is only 85 percent. And if we drop down to 10 percent of variant allele frequency, it's only 10 percent. So we're clearly missing a lot of these low frequency things. And so that spurred us to develop an algorithm called BASAVAC. Bayesian scoring of somatic variant read counts. It's a little convoluted, but it works. And so this incorporates purity, ploydy, base quality, and a host of other factors into a more complex model. We pull these all together into a Bayesian framework and then obtain the probabilities that in a particular single nucleotide variant is either heterozygous or homozygous given the input data. And so we've tested this against other algorithms. This is kind of a worst case simulation, but you can see that even in this difficult environment, it's pushing our curve of variant allele frequency far to the left compared to somatic sniper and these kind of first generation callers. We've also done some real world testing, and I want to tell you one particularly cool data set that we've been working with. This is a quintet of samples, including a primary breast tumor, a match normal, and three different metastases, the spinal, the liver, and the adrenal glands. And so we genome sequenced all of these to 30X and ran these through our initial pipeline. This was prior to the development of BASAVAC. And then capture validation was formed for all these variants. So we have very deep sequencing read counts for all of these variants and all of these samples. And so we were able to combine these all and then make cool plots that look like this. So I'm showing you here on the x-axis is the variant allele frequency of single nucleotide variants in the primary tumor. And on the y-axis, you're seeing the frequency of events in the metastasis. And so several trends emerge from this kind of plot. You can see that in the, at about 50%, you see, which corresponds to 100% of the cells in the tumor for heterozygous events. We see the major clone or the founding clone, which is also present at about 50% of the metastasis as we'd expect. Down here we see another cluster of a kind of minor clone that is present at about 25%. And that, again, also pass through to the metastasis. And then contrast, down here on the x-axis, what we see is a clone that was present in the original tumor but didn't pass through to the metastasis. So this was a separate population of cells that didn't make it through the population bottleneck or didn't make it through the metastasis event. Then on the y-axis, what we see are events that happened in the spinal metastasis, presumably after the split. So they're not present in the original tumor. Or at least that's what we thought. When we zoomed in a little bit closer to this y-axis, what we can see is that these events I've highlighted in red actually were present at the tumor, just at a very low frequency. And so that suggests that maybe they either had a growth advantage in the environment of the metastasis or just made up a majority of the cells that split off into the metastasis. But either way, they're clearly present. And getting back to our variant calling then, this gives us a source of very low-frequency variants in this tumor that we know are real because they're present in the metastasis as well. And so we use these kind of events to test the sensitivity of our algorithms. So this is a comparison between three algorithms, Bassivac, our new caller, Sniper, our old caller, and Strelka, which is a caller from Illumina, which purports to do better on these kind of low-variant frequency events. And what you can see is that Bassivac and Sniper detect a lot of events, and they're very comparable performance at kind of high-variant allele frequencies and mid-variant allele frequencies. Strelka doesn't maybe do as well, but about 10% there's an inflection point where Sniper just isn't able to detect stuff. Strelka does a little bit better, but Bassivac detects a huge number of these very low-frequency true positive events. And in the end, even with this kind of biased 30X approach, originally, we can see that 50% of the variants present in the metastasis are present at the detectable level in the tumor, even though we would have expected a much smaller proportion if we hadn't looked closely and looked deeply with the capture or validation. But more importantly, what we can say here is that we can use Bassivac to detect these true variants at very low frequencies, down to, and then even lower than 2%. So given this kind of information then about these low-frequency variants, how can we put this to use to kind of infer the subclonal architecture of a tumor and find out how many clones are present in there, which variants are present in the different subclones? And this really requires an integrative approach. We look at both the very low frequencies of the SNPs, as well as information on the copy number calls, purity, employee D. And so we can put it all together into beautiful charts that look like this. I'm gonna zoom in so you can actually see what's going on here. And so we'd segregate the SNPs according to copy number and then we plot their very low frequencies along with the depth just kind of for our reference on the Y-axis. And you can see here in the 2X plot you get a clear indication of the founding clone. And then we overlay it with a kernel density plot on top here. So you can see this clear peak at 50% which tells us this is the major, the founding clone. And then you see these variants down here with a little bit lower frequency correspond to blips up here that represents subclones. And then over on the right side you can see events that are copy number neutral loss of heterozygacity up here near 100%. And the copy number three regions what you can see is that instead of the 50% major clones we expect, we expect peaks at 33 and 66% depending on whether the wild type or the mutant allele got amplified. And we do see that indeed in the data. And so we can build these plots for all of our tumors and you kind of eyeball it and say this clearly looks like it's a two clone tumor, a major clone and a minor clone. But we get very leery of kind of eyeballing plots. We like to do it in a more rigorous fashion. So we decided to come up with a method that could do this in an automated and kind of unbiased manner. And so what we ended up doing was creating an algorithm that uses a mixture model of binomial distributions. It's got to model this data and then use maximum likelihood expectation to determine what the optimal number of clusters was for any given solution. And so we can see that indeed this algorithm clusters this into two groups. So as there's a major clone or a minor clone I've overlaid the calls here. And this is a bicolonial sample. Here's a case where it's a tricolonial sample that clearly agrees with our eyeballing of the data. And there are cases that are a little more less intuitive I guess. This is a case where maybe if you looked at just the density plot you might say this was a two clone tumor. But if you look carefully you can see that there's a nice peak here, there's a nice peak here and then kind of a smear in the middle. And the algorithm is a very nice job of picking that up, fitting another curve in the middle there and saying this is indeed a three clone tumor. And then we also have more messy tumors. This is a multicolonial sample with a smear of data and we don't think we're doing too much overfitting in this kind of case. We think there really are a variety of clones here but it's very difficult to segregate them accurately with this kind of smear of mutations. So we've applied this across a large sample set of tumors looking mostly at AML, breast cancer and ametriocancer. And we can say that most of the tumors in those data sets have at least one founding clone and one or more sub clones. And I also want to emphasize that the numbers I'm showing here these are going to be a lower bound on the number of clones. First of all, detection sensitivity hurts us because not all these calls were made using passive act. But more importantly I think is that we're unable to distinguish with this kind of data between two independent clones that both occur at say 20% grain allele frequency without kind of single sum out that there's no way to get that from this data. So in conclusion, we can detect some applications at very low frequencies using Bacevac, our new caller. And we developed an R package for automatically inferring the sub clone architecture in tumors. We hope to release beta versions of both of these by the end of the year. They're not currently available but will be shortly. And really the overarching goal of this kind of research is to characterize these minor sub clones at diagnosis rather than discovering their presence at the relapse when it may be already too late to design appropriate treatments. So in conclusion, I'd like to acknowledge a host of people who made this possible. Mike Wendell has been leading the Bacevac project and Nathan Dees has been pushing a clonality analysis out the door, a host of people at the genome center over here who have contributed in one way or another. Our collaborators who have provided data and expertise and advice of leadership at the genome center and then our funding agencies at the NNCRI and the NCI and of course the cancer genome atlas. Thanks. I wonder if there isn't an important implication in your breast cancer metastasis finding. So if I understand correctly, in the metastasis you got both the tumor dominant clone with all the 50% alleles and you got a tumor sub clone. So does that predict then the metastasis must not have come from a single cell but rather a clump of cells that had both the minor and the major one in it. Or you're recreating all those mutations in the metastasis. Well so any mutation that's present in the founding clone is going to be present in all the sub clones as well. But the fact that it does appear at a lower during the frequency does indeed predict that it's not a single cell that caused that metastasis that it was a clump of cells containing both those original ones and a subset with additional mutations from the sub clone. Yeah. I'm not an expert in this area. I know there's a lot done, lots said about individual breast cancer cells being found in the bone marrow and et cetera and whether clumps might be more of the thing that metastasis. Yeah, I don't doubt that this single cells may be capable of that, but in this case it's clearly not one cell. Yes, hi, I have a technical question. Sure. It seems to me that the selection of the bandwidth of your kernel density estimate should affect the number, the estimate of the maximum likelihood estimate of the number of subclonal populations. Have you looked into that or how do you choose that bandwidth? So we don't actually use the bandwidth when we're doing the binomial fitting. The bandwidth is clearly smoothing for I just to get the pretty pictures. So we actually just take the raw data and feed it into the algorithm, so yeah. So here I got a question. So if you were to compare the clonality analysis coming from exomes versus full genomes, are they, do they give you similar answers, the same answer? That's very dependent upon the number of variants that we're finding in these tumors. For example, even some of the whole genomes are definitely in some of the AML exomes where you see a very few mutations. It's very hard to cluster with only 10 mutations, right? It's very hard to know what's going on there. The exomes, so we do have to set a minimum threshold on the number of mutations that we have. The breast cancer data where there's 20 basal genomes, full sequence, then the exomes on those, right? Yeah, when we have whole genomes, it's really because you can include all those tier two and tier three mutations and get hundreds of mutations to get much finer resolution on your kind of subclonal architecture with just tier one exome stuff. It's a little bit harder, but we can do it provided that there's enough mutations in the sample. There's enough mutations. Gotcha, all right. Thank you. So our next speaker will be Adam Ewing from UC Santa Cruz.