 So, cancer is a disease of the genome, and although arguably you can make that statement about a lot of diseases, part of what makes cancer fascinating is that there are two distinct ways in which that statement is true. Like many other diseases, there are germline risk variants that confer a lifetime risk to developing cancer, but there are also somatic alterations that develop in the cancer tissue that directly contribute to tumor genesis, metastasis, drug resistance, and other phenotypes. The heritability of cancer has been known for a while from family and twin studies, and it varies quite a bit from tumor type to tumor type. We know that the inheritance and the heritability of cancer is not simple and it's mediated very likely with a complex polygenic inheritance. Part of the genetic revolutions of the 80s and 90s was the use of positional cloning and family studies to discover rare, highly penetrant variants that mediate familial cancer syndromes, and the sort of highlights of these were discovery of P53 and leaf romani syndrome, APC and familial colon cancer, MLH, MSH genes in Lynch syndrome, BRCA1, BRCA2, familial breast cancer, and RB1 retinal blastoma. Together all these high-risk, high-penetrant variants explain a small percentage of the heritable risk for various tumor types. Recently the emergence of genome-wide association studies examining common SNP variation in case and control cohorts have yielded a host of loci in both cancer and non-cancer disease analysis. Sorry, can I use this for a later point, or is this... Oh, here's the point, okay, great, thanks. So over the last five years there have been a number of cancer GWASs that have yielded over 300 loci in over 20 cancer types. And together these loci can explain as much as 10, 23, and 6% of the heritable risk for breast prostate and colorectal cancer, respectively. So here's a sort of a large-scale picture from a review in 2010 of cancer GWAS loci, and we see that there are a number of these loci around the genome. The majority of the GWAS loci have been discovered for prostate, pancreatic, lung cancer, hematopoietic tumors, and breast cancer. A lot of these loci lie in endogenic regions, and their role in disease has not been functionally elucidated. In familial germline loci there's a lot of precedence for genes that are mutated in germline families but also are somatically altered in cancer. And of course among these are P53, APC, RB1, CDKN2A, VHL, NF1. A lot of these are tumor suppressor genes which undergo a two-hit alteration whereby they're inactivated in the germline and then undergo a second hit in the tumor tissue to become lost, to undergo loss of function. So we decided to examine the interplay of common variant loci discovered in GWAS studies and somatic copy number alterations to examine this hypothesis on the genome scale. So we took 297 cancer loci documented in the NHGRI GWAS database from over 80 GWASs and then took a list from a recently published study by Burhim and colleagues at the Broad in 2010 examining somatic copy number alterations across over 3,000 tumors and over 20 tumor types. And this study yielded 258 copy number peak regions. And our approach was quite simple to basically examine the overlap between these two regions of loci and determine whether it was significant against a null model built using permutations. So briefly the way we assembled the data was obtaining these loci from GWAS database. These are usually reported as a single SNP. We of course had to handpick the cancer related GWASs and then for each SNP defined a locus using the linkage disequilibrium neighborhood of that SNP. And with that sort of collapsing loci into unique regions we found 219 unique GWAS loci covering 1.1 percent of the mappable genome. On the SCNA side we took all the loci reported in the Burhim and colleagues study in 2010 and basically we took the PAN tumor analysis combined with all the tumor type sub-analyses and found 258 total SCNA peak regions comprised of 8.4 percent of the mappable genome and that included, sorry, 198 amplification hotspots and 67 deletion peak regions. So crossing these two loci we see the plot shown here and this is a genome wide plot where the GWAS loci are represented. They lie on the chromosome as these little dots and then we see the SCNA peak regions hovering over each chromosome and then the intersecting overlapping loci are shown in red. And so among these intersecting loci we have a lot of known cancer genes as well as novel loci. So to determine the significance of this overlap we performed a genome wide permutation, actually a number of permutations. And what we found was a strikingly significant overlap in GWAS loci and frequently altered or frequently amplified regions of the genome and cancer. In contrast when we analyzed, when we compared cancer GWAS loci against commonly deleted peak regions we saw the distribution of overlap as shown here in the left plot and following permutation we found actually that this intersection was not significant. And so this is quite interesting. We pursued a second line, a sort of orthogonal line of investigating this overlap by comparing cancer related loci versus non-cancer related GWAS regions and we found that cancer related GWAS regions were enriched, significantly enriched in overlap with amplification peak regions but again did not frequently overlap with deletion SCNAs. So one feature of the GWAS findings across all the cancer germline analyses is that there's a hotspot of association where there's multiple loci and prostate cancer and GBM and other cancer types on the region of 8Q24 near the mycologus. So to examine whether our results were robust to the removal of this region which is also frequently known to be amplified we did a separate analysis where we excluded these loci and we still found a significant colloquialization of these amplification peak regions with cancer GWAS loci outside the mycologus. We applied this analysis to different subtype analyses and we also saw a significant association in the lung and hematopoietic overlap with lung and hematopoietic peak regions and cancer GWAS loci. So these results show an interesting correlation or colloquialization and they could be explained very simply by regions or genes that tend to undergo somatic alteration frequently in some patients and then in other patients are mutated in the germline genome and confer lifetime risk. But it doesn't really suggest any kind of germline somatic interplay. For example in commonly the classic model of tumor suppressor loss involves a germline inactivation followed by a somatic alteration that results in loss of function. So we wanted to ask this question of whether a germline SNP status actually confers risk for specific somatic alterations in this case amplifications and a really interesting precedent for this was published a few years ago where a common variant SNP conferring risk to myeloproliferative disease located in Jack II actually was shown to be predisposed to the development of the somatic point mutation in Jack II in these same neoplasms and what was really fascinating was that the actual risk conferring germline common variant SNP would occur on the exact same on the same haplotype as the somatic mutation. So we wanted to pursue this question in the context of copy number alteration so how would we detect this? Well if we have a heterozygous SNP in a tumor in a patient's tumor tissue this is a germline heterozygous SNP and a tumor decides to undergo a copy number alteration in this case amplification at that locus. Well it might sometimes choose the C locus containing the C SNP to amplify but other times it may choose the locus containing the A SNP to amplify. And if the tumor is sort of equivocal it does not care about the status of that germline allele well then it'll amplify these two different alleles at an equal proportion. However if there's something special that it sees on that C allele that maybe it likes it's positively selected for then we'll see it amplifying that C allele time and time again. And so the way that you can test this is something using something called allele distortion test and this was first proposed by Duol and colleagues in bioinformatics in 2009 basically at each heterozygous SNP that undergoes a copy number alteration we can measure how frequently one allele versus the other alleles amplified or deleted. And then we can test the significant deviation of this frequency from one half just using a chi-square distribution or a Fisher's exact test. So basically if we see that the tumor amplifies an allele in allele A in 30 out of 35 sets may be evidence for some kind of allelic bias and a germline somatic interplay. So we examined this in several data sets including the original GCM data from which the global cancer map data and TCGA data using 6.0 SNP arrays and we were sort of disappointed to find that zero of these 36 loci, cancer GWS loci that intersect the somatic amplification peak showed significant allelic distortion. Before we applied this scan genome wide and we found a significant allelic distortion at 11Q13. This allelic distortion occurred zooming in on this locus we found that this is a SNP that lied a 100 KB upstream of the cyclin D1 locus and near the myov gene and examining the individual events that contribute to this signal we found that 44 out of the 50 tumors that amplify this SNP choose the C allele over the T allele and we can see here the individual tumors that support the signal and we see that across many tumor types we're seeing many tumor types that amplify this locus which is of course a frequently amplified region and they tend to choose this C allele over the T allele. We applied the same analysis in the TCGA 6.0 SNP data spanning over 2,000 tumors and again we saw many tumor types ovarian breast lung amplifying this locus and choosing this C allele over this T allele and this was strikingly significant. We saw the same effect in the Brodenovardus cancer cell line encyclopedia this is a set of over 900 cell lines that have been profiled with both expression copy number sequencing we saw the same effect which was quite significant. So this is an interesting result partly because it does actually lie in obviously a frequently amplified region it lies not too far from a GWAS peak but does not lie in an actual significant GWAS region. So the question is what is the biology that may be driving this? We clearly see that these tumors tend to preferentially amplify the C allele over the T allele at this locus does this C allele, germline C allele carry some kind of selective advantage that the tumor is really going after for example is that cyclin D1 associated that's nearby more expressible or somehow has a selectively advantageous genotype or perhaps maybe there's some kind of interaction with the amplification machinery that maybe alters that local amplification rate somatic amplification rate and that phenotype is carried somewhere on you know by that C allele or is tagged by that C allele. So we took some steps to examine this hypothesis looking at the cell line cyclopedia and examined the interaction of the allele status of this SNP with total copy number and expression to examine whether there was an expression quantitative trait locus at this SNP and we found a mild but significant effect whereby with increasing doses of allele C when we control for total copy number we actually see increased expression of cyclin D1. We did not see this EQTL with any of the other genes in this locus. So in summary we see a significant overlap of germline GWAS peaks or loci and SCNA peak regions across cancer types and this is predominantly with amplifications also what I didn't show in amplifications plus deletions but not with deletions only. As far as we know this is the first evidence for germ genome-wide collocalization of germline susceptibility variants and somatically altered loci and cyclin D1 is an interesting candidate CIS somatic trait locus or CIS STL and we're investigating that further and I'll take any questions next oh and sorry I'd like to thank the my collaborators on this effort Scott Carter, Rameen Burkheem, Craig Murmell, Gaddy Getz and my mentor Matthew Meyerson. Thanks. Thank you Marcine that was a great great talk question Susan. Hi why did you choose to just use the heterozygotes not use the two homozygotes as well? So the the in the setting of the heterozygous SNP we can come up with a simple statistical test that would determine some sort of selective advantage that or some sort of effect I think we've also so that test I think is is a good test because it's less so the the other alternative is to use a trend test or sort of a case control test where we look at amplified and non-amplified or patients that are amplifying versus non-amplifying that locus and compare it for example minor little frequencies that test this tends to be more prone to population stratification and and yields a lot of yields messier results although it is potentially powerful so both are good ways of approaching this problem the ADT is more similar to the TDT in germline genomics which which again is also less prone to population stratification but it's perhaps less powerful requires more samples but we've actually tried both we just achieved cleaner results with this test. Yeah just a question have you taken the copy number variation into account or have you considered that as another possible way that you could get amplifications or deletions? Oh germline copy number yeah germline polymorphism right so in in in the context of this data we are only looking at somatic copy number events but that that certainly could be another driver of this kind of effect and I mean we're only looking also at germline single nucleotide polymorphism a more general application of this strategy would be to look at large other kinds of somatic variants and other kinds of germline variants and that's certainly a great direction of additional exploration. I have a quick question so for the GWAS peaks have you attempted to further stratify them by looking at the heterozygotes and trying to discern whether or not they the mode of inheritance is dominant versus recessive among the GWAS peaks and then repeating your analysis to see if there's a difference. That would be a great analysis unfortunately a lot of the the GWAS data that we're using we don't have the the underlying genotypes those are either not accessible or require additional we haven't been able to access yet so definitely that's a that would be an interesting analysis also analyzing the summary stats of some of these GWAS to examine a low side that perhaps fell under the genome-wide significance threshold but but happened to collocalize with a copy number somatic copy number region would be an interesting way of finding additional germline effects okay thank you thanks