 So, the title of my talk is genome-wide analysis of EQTL in breast cancer, but really what I'm talking about today is the interaction between genotype and phenotype, more specifically the interaction between germline genotype and breast cancer phenotype. So, the genome-wide association study is a widely used method for investigating this relationship between genotype and phenotype on a genomic scale. Breast cancer has been widely studied with the genome-wide association study, and if we look at the genome-wide association study catalog, we see about 50 risk alleles which predict risk of breast cancer. The question you might ask is how do we infer the mechanism of these risk alleles, or how do these alleles lead to an increased risk of breast cancer, and what's the means by which we can understand how variation of these loci has a functional consequence. Using the EQTL framework, we treat gene expression as a phenotype. Using gene expression profiling methods such as RNA-seq or microarray, we can easily measure tens of thousands of features simultaneously, and this facilitates the investigation of the functional consequences of genetic variants of these loci. So, in our EQTL analysis consisted of three parts, our germline genotype data, our tumor gene expression data, and our ER status data. This was from 382 TCGA-invasive breast cancer cases from Caucasian individuals. Our germline SNP data came from an AFI-6 SNP array, and our expression came from an Agilent 244K customer. We took the about one million loci from the AFI-6 array, and we imputed it to about eight million loci for the analysis. So getting from one million SNPs to eight million SNPs, like I said, is done using imputation, wherein we estimate genotype for ungenotype markers using a genotype reference panel. In this case, the 1,000 genomes was our reference panel. So we used Beagle to infer haplotypes for unrelated individuals, and Minimac to implement the actual imputation. That got us to about 16 million SNPs. We then took the eight million most variant. So here's a plot of the first two principal components of our genotype data, and our 382 cases came from the red cluster you can see here. So we represented the interaction between gene expression and genotype with a linear model with parameters for genotype and ER status, which is our covariate. We used the R-package matrix EKTL to implement the EKTL analysis. Matrix EKTL uses large matrix operations to optimize the testing for every SNP transcript pair, of which we had about 1.2 trillion, which is a lot. And we did, along with using ER as a covariate, we also did EKTL detection in ER positive alone and ER negative alone. So of the about eight million SNPs, we found that about 140,000 of these were significant EKTL. We also found that none of the 51 breast cancer risk alleles from the GWAS catalog were detected as EKTL. So we see here that there does not seem to be an association between risk allele status and EKTL status. So another way we can think about our results is if we think about this as a bipartite graph, wherein each EKTL can be represented as a loci pointing to a quantitative trait. And if we think about it this way, we can compute the indegree of our quantitative traits. So how many loci per quantitative trait? The other way of thinking about it is out-degree. So how many quantitative traits per loci? We can also look at connected regions of the graph, so which quantitative traits are connected to one another through SNPs, et cetera. So here we have the indegree distribution of our quantitative traits. We see most of the transcripts have one or two loci, which they interact with, and a small number of transcripts are interacting with a large number of loci. Here's the other side of that. These are the out-degree distribution of the loci, and we see the same sort of thing where a small number of loci are interacting with a large number of transcripts. Here are the quantitative traits with the highest indegree. We see some interesting stuff. Prolactin is known to play a role in breast biology. Men1 has been implicated in a variety of cancers. Another way we can sort of visualize this is by taking a rolling mean of eqTLs across the genome, starting with genome one and going all the way to, excuse me, chromosome one, going all the way to the X chromosome. So the last thing I wanted to talk about were these ER dimorphic eqTLs. So like I said earlier, we ran the eqTL analysis in ER positive alone and ER negative alone, as well as with ER as a covariate. So we found 32 eqTLs with an opposite sign of the interaction in the positive and the negative, and these are the six genes which were the transcripts from these eqTLs. And several of them seem to have roles with apoptosis, which I think warrants further investigation. So finally, of the 1.2 trillion SNP transcript interactions, about 375,000 eqTL were found. We found that risk allele status really does not predict eqTL status, but that ER status can interact with the direction of an eqTL. Finally, it does seem that germline genotype can land inside into breast cancer phenotype. I'd like to thank my boss, Andy Beck, and from the Harvard School of Public Health. Didi Hazra, Peter Kraft, John Quackenbush, and Connie Chen. There's plenty of time, we'll take questions. Thank you. Questions for Nicholas? Is there any correlation between these SNPs that are identified and when you do genomic DNA and SNPs? I'm sorry? I mean, these are all obtained from expression profiling, right? The SNPs are from SNP genes, I think. Yeah, okay. So the eqTLs that are dimorphic between ER positive and ER negative, were they generally going, like for example, for IGF1 receptor, was that more highly expressed in the ER negative? Was there like a negative correlation or a positive correlation? Does it? Right, between like the minor allele. Yeah, which direction was it versus ER positive versus ER negative? So they consistent across? Right, so it seems that most of the apoptosis related transcripts seems to be lower in the minor allele in the ER negative, I believe, and then the converse in the ER positive. Does that make sense? All right, I think I got it. Okay, thanks. I'll talk to you later. Matthew? Sure, I was very curious about your result, which is at least naively thinking, surprising that the germline risk alleles are not associated with eqTLs. And do you think, does this suggest alternative hypotheses for the role of these germline alleles in promoting cancer other than being modulators of expression? So I think it's entirely possible that these SNPs may lead to cancer, but then in cancer they do not predict any change in expression. That seems to be probably the most likely result, but... Yeah. I have a question. I want to understand, did you use a adjacent normal tissue or the tumor tissue to look at the gene expression? Gene expression is from a tumor tissue. So this could be affected by the stage of the tumor. Did you do analysis by stage? We didn't do analysis by stage. We really only broke it down by ER status, really, to keep sample size large. But yeah, that's certainly could play a role. One last question. Yes, in the case of ER-associated eqTL, have you checked whether separating pre-menopausal or post-menopausal cases could change things because actually estrogen levels vary before menopausal state, and that could affect gene expression. Yeah, no, absolutely. We haven't looked at anything really besides ER status, but looking forward to the other number of different covariates we could use. Thank you. Thank you very much.